83 datasets found

i
Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub...
hub.mph.in.gov
Updated Sep 14, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub [Dataset]. https://hub.mph.in.gov/dataset/claims-by-zip-code-and-category-of-services
Explore at:
Dataset updated
Sep 14, 2017
Description
Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the major services for patients. It contains information about the total number of patients, total number of claims, and dollar amount paid, grouped by recipient zip code. Restricted to claims with service date between 01/2012 to 12/2017. Service categories considered are: 01 - Inpatient Service 03 - Outpatient Service 06 - Physician Service 11 - Lab Service 12 - X-Ray Service 17 - Clinic Service 26 - Mental Health Service 27 - Dental Service/Child 28 - Dental Service/Adult 31 - Eye Care and Exams 38 - EPSDT Service Provider is billing provider. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA. Distance between recipient and provider is a straight-line distance calculated and not the physical distance.
Z
Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...
data.niaid.nih.gov
zenodo.org
Updated Apr 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elena Stefancova (2022). Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5996863
Explore at:
Dataset updated
Apr 22, 2022
Dataset provided by
Elena Stefancova
Matus Tomlein
Branislav Pecher
Jakub Simko
Robert Moro
Maria Bielikova
Ivan Srba
Description
Overview

This dataset of medical misinformation was collected and is published by Kempelen Institute of Intelligent Technologies (KInIT). It consists of approx. 317k news articles and blog posts on medical topics published between January 1, 1998 and February 1, 2022 from a total of 207 reliable and unreliable sources. The dataset contains full-texts of the articles, their original source URL and other extracted metadata. If a source has a credibility score available (e.g., from Media Bias/Fact Check), it is also included in the form of annotation. Besides the articles, the dataset contains around 3.5k fact-checks and extracted verified medical claims with their unified veracity ratings published by fact-checking organisations such as Snopes or FullFact. Lastly and most importantly, the dataset contains 573 manually and more than 51k automatically labelled mappings between previously verified claims and the articles; mappings consist of two values: claim presence (i.e., whether a claim is contained in the given article) and article stance (i.e., whether the given article supports or rejects the claim or provides both sides of the argument).

The dataset is primarily intended to be used as a training and evaluation set for machine learning methods for claim presence detection and article stance classification, but it enables a range of other misinformation related tasks, such as misinformation characterisation or analyses of misinformation spreading.

Its novelty and our main contributions lie in (1) focus on medical news article and blog posts as opposed to social media posts or political discussions; (2) providing multiple modalities (beside full-texts of the articles, there are also images and videos), thus enabling research of multimodal approaches; (3) mapping of the articles to the fact-checked claims (with manual as well as predicted labels); (4) providing source credibility labels for 95% of all articles and other potential sources of weak labels that can be mined from the articles' content and metadata.

The dataset is associated with the research paper "Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims" accepted and presented at ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22).

The accompanying Github repository provides a small static sample of the dataset and the dataset's descriptive analysis in a form of Jupyter notebooks.

Options to access the dataset

There are two ways how to get access to the dataset:

Static dump of the dataset available in the CSV format

Continuously updated dataset available via REST API

In order to obtain an access to the dataset (either to full static dump or REST API), please, request the access by following instructions provided below.

References

If you use this dataset in any publication, project, tool or in any other form, please, cite the following papers:

@inproceedings{SrbaMonantPlatform, author = {Srba, Ivan and Moro, Robert and Simko, Jakub and Sevcech, Jakub and Chuda, Daniela and Navrat, Pavol and Bielikova, Maria}, booktitle = {Proceedings of Workshop on Reducing Online Misinformation Exposure (ROME 2019)}, pages = {1--7}, title = {Monant: Universal and Extensible Platform for Monitoring, Detection and Mitigation of Antisocial Behavior}, year = {2019} }

@inproceedings{SrbaMonantMedicalDataset, author = {Srba, Ivan and Pecher, Branislav and Tomlein Matus and Moro, Robert and Stefancova, Elena and Simko, Jakub and Bielikova, Maria}, booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22)}, numpages = {11}, title = {Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims}, year = {2022}, doi = {10.1145/3477495.3531726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3477495.3531726}, }

Dataset creation process

In order to create this dataset (and to continuously obtain new data), we used our research platform Monant. The Monant platform provides so called data providers to extract news articles/blogs from news/blog sites as well as fact-checking articles from fact-checking sites. General parsers (from RSS feeds, Wordpress sites, Google Fact Check Tool, etc.) as well as custom crawler and parsers were implemented (e.g., for fact checking site Snopes.com). All data is stored in the unified format in a central data storage.

Ethical considerations

The dataset was collected and is published for research purposes only. We collected only publicly available content of news/blog articles. The dataset contains identities of authors of the articles if they were stated in the original source; we left this information, since the presence of an author's name can be a strong credibility indicator. However, we anonymised the identities of the authors of discussion posts included in the dataset.

The main identified ethical issue related to the presented dataset lies in the risk of mislabelling of an article as supporting a false fact-checked claim and, to a lesser extent, in mislabelling an article as not containing a false claim or not supporting it when it actually does. To minimise these risks, we developed a labelling methodology and require an agreement of at least two independent annotators to assign a claim presence or article stance label to an article. It is also worth noting that we do not label an article as a whole as false or true. Nevertheless, we provide partial article-claim pair veracities based on the combination of claim presence and article stance labels.

As to the veracity labels of the fact-checked claims and the credibility (reliability) labels of the articles' sources, we take these from the fact-checking sites and external listings such as Media Bias/Fact Check as they are and refer to their methodologies for more details on how they were established.

Lastly, the dataset also contains automatically predicted labels of claim presence and article stance using our baselines described in the next section. These methods have their limitations and work with certain accuracy as reported in this paper. This should be taken into account when interpreting them.

Reporting mistakes in the dataset The mean to report considerable mistakes in raw collected data or in manual annotations is by creating a new issue in the accompanying Github repository. Alternately, general enquiries or requests can be sent at info [at] kinit.sk.

Dataset structure

Raw data

At first, the dataset contains so called raw data (i.e., data extracted by the Web monitoring module of Monant platform and stored in exactly the same form as they appear at the original websites). Raw data consist of articles from news sites and blogs (e.g. naturalnews.com), discussions attached to such articles, fact-checking articles from fact-checking portals (e.g. snopes.com). In addition, the dataset contains feedback (number of likes, shares, comments) provided by user on social network Facebook which is regularly extracted for all news/blogs articles.

Raw data are contained in these CSV files (and corresponding REST API endpoints):

sources.csv

articles.csv

article_media.csv

article_authors.csv

discussion_posts.csv

discussion_post_authors.csv

fact_checking_articles.csv

fact_checking_article_media.csv

claims.csv

feedback_facebook.csv

Note: Personal information about discussion posts' authors (name, website, gravatar) are anonymised.

Annotations

Secondly, the dataset contains so called annotations. Entity annotations describe the individual raw data entities (e.g., article, source). Relation annotations describe relation between two of such entities.

Each annotation is described by the following attributes:

category of annotation (annotation_category). Possible values: label (annotation corresponds to ground truth, determined by human experts) and prediction (annotation was created by means of AI method).

type of annotation (annotation_type_id). Example values: Source reliability (binary), Claim presence. The list of possible values can be obtained from enumeration in annotation_types.csv.

method which created annotation (method_id). Example values: Expert-based source reliability evaluation, Fact-checking article to claim transformation method. The list of possible values can be obtained from enumeration methods.csv.

its value (value). The value is stored in JSON format and its structure differs according to particular annotation type.

At the same time, annotations are associated with a particular object identified by:

entity type (parameter entity_type in case of entity annotations, or source_entity_type and target_entity_type in case of relation annotations). Possible values: sources, articles, fact-checking-articles.

entity id (parameter entity_id in case of entity annotations, or source_entity_id and target_entity_id in case of relation annotations).

The dataset provides specifically these entity annotations:

Source reliability (binary). Determines validity of source (website) at a binary scale with two options: reliable source and unreliable source.

Article veracity. Aggregated information about veracity from article-claim pairs.

The dataset provides specifically these relation annotations:

Fact-checking article to claim mapping. Determines mapping between fact-checking article and claim.

Claim presence. Determines presence of claim in article.

Claim stance. Determines stance of an article to a claim.

Annotations are contained in these CSV files (and corresponding REST API endpoints):

entity_annotations.csv

relation_annotations.csv

Note: Identification of human annotators authors (email provided in the annotation app) is anonymised.
i
Claims Servicing Mental Health Patient by Provider - Dataset - The Indiana...
hub.mph.in.gov
Updated Sep 14, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Claims Servicing Mental Health Patient by Provider - Dataset - The Indiana Data Hub [Dataset]. https://hub.mph.in.gov/dataset/claims-servicing-mental-health-patient-by-provider
Explore at:
Dataset updated
Sep 14, 2017
Description
Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the claims that serviced mental health patients. It contains information about the total number of patients, total number of claims, and total dollar amount, grouped by provider. Restricted to claims with service date between 01/2016 to 12/2016. Patients with mental health problems is identified by a list of mental health patients matched to their Medicaid recipient id from DMHA. ER claims are defined as claims with CPT codes: 99281, 99282, 99283, 99284, and 99285. Providers are billing providers. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.
HCPCS Level II
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2019). HCPCS Level II [Dataset]. https://www.kaggle.com/cms/cms-codes
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset authored and provided by
Centers for Medicare & Medicaid Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Healthcare Common Procedure Coding System (HCPCS, often pronounced by its acronym as "hick picks") is a set of health care procedure codes based on the American Medical Association's Current Procedural Terminology (CPT).

HCPCS includes three levels of codes: Level I consists of the American Medical Association's Current Procedural Terminology (CPT) and is numeric. Level II codes are alphanumeric and primarily include non-physician services such as ambulance services and prosthetic devices, and represent items and supplies and non-physician services, not covered by CPT-4 codes (Level I). Level III codes, also called local codes, were developed by state Medicaid agencies, Medicare contractors, and private insurers for use in specific programs and jurisdictions. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) instructed CMS to adopt a standard coding systems for reporting medical transactions. The use of Level III codes was discontinued on December 31, 2003, in order to adhere to consistent coding standards.

Content

Classification of procedures performed for patients is important for billing and reimbursement in healthcare. The primary classification system used in the United States is Healthcare Common Procedure Coding System (HCPCS), maintained by Centers for Medicare and Medicaid Services (CMS). This system is divided into two levels: level I and level II.

Level I HCPCS codes classify services rendered by physicians. This system is based on Common Procedure Terminology (CPT), a coding system maintained by the American Medical Association (AMA). Level II codes, which are the focus of this public dataset, are used to identify products, supplies, and services not included in level I codes. The level II codes include items such as ambulance services, durable medical goods, prosthetics, orthotics and supplies used outside a physician’s office.

Given the ubiquity of administrative data in healthcare, HCPCS coding systems are also commonly used in areas of clinical research such as outcomes based research.

Update Frequency: Yearly

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/table/bigquery-public-data:cms_codes.hcpcs

https://cloud.google.com/bigquery/public-data/hcpcs-level2

Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What are the descriptions for a set of HCPCS level II codes?
HCPCS Level II
console.cloud.google.com
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Department%20of%20Health%20%26%20Human%20Services&hl=fr&inv=1&invt=Ab5Txg (2023). HCPCS Level II [Dataset]. https://console.cloud.google.com/marketplace/product/hhs/hcpcs?hl=fr
Explore at:
Dataset updated
Aug 22, 2023
Dataset provided by
Googlehttp://google.com/
Description
Classification of procedures performed for patients is important for billing and reimbursement in healthcare. The primary classification system used in the United States is Healthcare Common Procedure Coding System (HCPCS), maintained by Centers for Medicare and Medicaid Services (CMS). This system is divided into two levels: level I and level II. Level I HCPCS codes classify services rendered by physicians. This system is based on Common Procedure Terminology (CPT), a coding system maintained by the American Medical Association (AMA). Level II codes, which are the focus of this public dataset, are used to identify products, supplies, and services not included in level I codes. The level II codes include items such as ambulance services, durable medical goods, prosthetics, orthotics and supplies used outside a physician’s office. Given the ubiquity of administrative data in healthcare, HCPCS coding systems are also commonly used in areas of clinical research such as outcomes based research. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Evaluating Health Home Care Quality
kaggle.com
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Evaluating Health Home Care Quality [Dataset]. https://www.kaggle.com/datasets/thedevastator/evaluating-health-home-care-quality
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
Evaluating Health Home Care Quality

CMS Core Set and Health Home SPA Measures

By Health Data New York [source]

About this dataset

This dataset provides comprehensive measures to evaluate the quality of medical services provided to Medicaid beneficiaries by Health Homes, including the Centers for Medicare & Medicaid Services (CMS) Core Set and Health Home State Plan Amendment (SPA). This allows us to gain insight into how well these health homes are performing in terms of delivering high-quality care. Our data sources include the Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Inform Incentive Program (DSRIP) Data Warehouse. With this data set you can explore essential indicators such as rates for indicators within scope of Core Set Measures, sub domains, domains and measure descriptions; age categories used; denominators of each measure; level of significance for each indicator; and more! By understanding more about Health Home Quality Measures from this resource you can help make informed decisions about evidence based health practices while also promoting better patient outcomes

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains measures that evaluate the quality of care delivered by Health Homes for the Centers for Medicare & Medicaid Services (CMS). With this dataset, you can get an overview of how a health home is performing in terms of quality. You can use this data to compare different health homes and their respective service offerings.

The data used to create this dataset was collected from Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Incentive Program (DSRIP) Data Warehouse sources.

In order to use this dataset effectively, you should start by looking at the columns provided. These include: Measurement Year; Health Home Name; Domain; Sub Domain; Measure Description; Age Category; Denominator; Rate; Level of Significance; Indicator. Each column provides valuable insight into how a particular health home is performing in various measurements of healthcare quality.

When examining this data, it is important to remember that many variables are included in any given measure and that changes may have occurred over time due to varying factors such as population or financial resources available for healthcare delivery. Furthermore, changes in policy may also affect performance over time so it is important to take these things into account when evaluating the performance of any given health home from one year to the next or when comparing different health homes on a specific measure or set of indicators over time

Research Ideas

Using this dataset, state governments can evaluate the effectiveness of their health home programs by comparing the performance across different domains and subdomains.

Healthcare providers and organizations can use this data to identify areas for improvement in quality of care provided by health homes and strategies to reduce disparities between individuals receiving care from health homes.

Researchers can use this dataset to analyze how variations in cultural context, geography, demographics or other factors impact delivery of quality health home services across different locations

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: health-home-quality-measures-beginning-2013-1.csv | Column name | Description | |:--------------------------|:----------------------------------------------------| | Measurement Year | The year in which the data was collected. (Integer) | | Health Home Name | The name of the health home. (String) | | Domain | The domain of the measure. (String) | | Sub Domain | The sub domain of the measure. (String) | | Measure Description | A description of the measure. (String) | | Age Category | The age category of the patient. (String) | | Denominator | The denominator of the measure. (Integer) | | Rate | The rate of the measure. (Float) | | Level of Significance | The level of significance of the measure. (String) | | Indicator | The indicator of the measure. (String) |

Acknowledgements

...
Health Insurance Marketplaces
kaggle.com
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Health Insurance Marketplaces [Dataset]. https://www.kaggle.com/datasets/thedevastator/health-insurance-marketplaces/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Health Insurance Marketplaces

Rates, Benefits, Coverage and Networks

By Data Society [source]

About this dataset

Do you want to explore the complexities of Health Insurance Marketplace and uncover insights into plan rates, benefits, and networks? Look no further! With this dataset from the Centers for Medicare & Medicaid Services (CMS), you can investigate trends in plan rates, access coverage across states and zip codes, compare metal level plans (across years), as well as analyze benefit information all in one place.

We’ve provided six CSV files containing combined data from across all years: BenefitsCostSharing.csv provides details on benefits, BusinessRules.csv provides details about premium payment requirements for a plan or set of plans, Network.csv offers details about health plans’ networks of providers who offer services at different cost levels to members enrolled in a given plan or set of plans; PlanAttributes.csv gives attributes like age off dates for various plans; Rate.csv delivers information on rate changes; ServiceArea.csv reveals demographic characteristics related to each service area associated with a specific issuer and two CSV files that join data across years (Crosswalk2015 & Crosswalk2016).

So come on board and use your creativity to unlock the mysteries behind changes in benefits in relation to costs while exploring network providers within different regions!!!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains information about the health insurance plans offered in the US Health Insurance Marketplace. It includes data on plan benefits, cost-sharing, networks, rates and service areas for different states. The data can be used to compare and analyze plan characteristics across different states and ages which will help guide users decision making when purchasing a health insurance plan.

To begin using the dataset, you should start by looking at the columns available. These include State, Dental Plan, Multistate Plan (2015 & 2016), Metal Level (2015 & 2016), Child/Adult Only (2015 & 2016), FIPS Code, Zip Code Crosswalk Level, Reason for Crosswalk, Multistate Plan Ageoff (2016 & 2015) and MetalLevel Ageoff (2016 & 2015). These columns provide important information on each plan that can be used to compare them across states or between years.

Using this data you can explore several interesting questions such as: How do benefit levels vary among states? Are there any differences in network providers between states? What factors influence plan rates?

In order to answer these questions you should join together relevant tables from across years using Crosswalk 2015/2016 CSV files then organize your data accordingly so that it is easier to visualize differences in features between plans sold across different states or years. Once the information is organized it might be helpful to use visualizations such as line graphs or bar charts to view comparison between feature values of two plans versus one another more clearly in order differentiate variations of plans among Consumers.

By doing this you can gain a better understanding of how certain factors may affect rate changes over time or how certain benefit levels might differ by state which will allow Consumers make an informed choice when selecting their next health insurance plan

Research Ideas

Analyzing the effectiveness of different plan benefits and how they affect premiums to determine a fair price point for different types of healthcare plans.

Examining the variation in rates, benefits and coverage by state or zip code to identify potential trends or disparities in access to quality health care services across regions.

Developing an algorithm that can predict premium prices based on certain factors such as age groups, type of plan (metal levels), multistate coverage, etc., to help consumers more easily understand the true cost of their health insurance plans before committing to purchase them

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit -...
Claims Reimbursement to Health Care Providers and Facilities for Testing,...
data.cdc.gov
data.virginia.gov
+2more
Updated Mar 3, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HHS ASPA (2022). Claims Reimbursement to Health Care Providers and Facilities for Testing, Treatment, and Vaccine Administration of the Uninsured [Dataset]. https://data.cdc.gov/Administrative/Claims-Reimbursement-to-Health-Care-Providers-and-/rksx-33p3
Explore at:
application/geo+json, kml, xml, csv, xlsx, kmzAvailable download formats
Dataset updated
Mar 3, 2022
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
HHS ASPA
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
The COVID-19 Claims Reimbursement to Health Care Providers and Facilities for Testing, Treatment, and Vaccine Administration for the Uninsured Program provides reimbursements on a rolling basis directly to eligible health care entities for claims that are attributed to the testing, treatment, and or vaccine administration of COVID-19 for uninsured individuals. The program funding information is as follow:

TESTING The American Rescue Plan Act (ARP) which provided $4.8 billion to reimburse providers for testing the uninsured; the Families First Coronavirus Response Act (FFCRA) Relief Fund, which includes funds received from the Public Health and Social Services Emergency Fund, as appropriated in the FFCRCA (P.L. 116-127) and the Paycheck Protection Program and Health Care Enhancement Act (P.L. 116-139) (PPPHCEA), which each appropriated $1 billion to reimburse health care entities for conducting COVID-19 testing for the uninsured.

TREATMENT & VACCINATION The Provider Relief Fund, which includes funds received from the Public Health and Social Services Emergency Fund, as appropriated in the Coronavirus Aid, Relief, and Economic Security (CARES) Act (P.L. 116-136), provided $100 billion in relief funds. The PPPHCEA appropriated an additional $75 billion in relief funds and the Coronavirus Response and Relief Supplemental Appropriations (CRRSA) Act (P.L. 116-260) appropriated another $3 billion. Within the Provider Relief Fund, a portion of the funding from these sources will be used to support healthcare-related expenses attributable to the treatment of uninsured individuals with COVID-19 and vaccination of uninsured individuals. To learn more about the program, visit: https://www.hrsa.gov/CovidUninsuredClaim

This dataset represents the list of health care entities who have agreed to the Terms and Conditions and received claims reimbursement for COVID-19 testing of uninsured individuals, vaccine administration and treatment for uninsured individuals with a COVID-19 diagnosis.

For Provider Relief Fund Data - https://data.cdc.gov/Administrative/HHS-Provider-Relief-Fund/kh8y-3es6
b
CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets -...
data.bris.ac.uk
Updated Dec 11, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/2h4rmk9v7pw2k23h7vgf9tx1ea
Explore at:
Dataset updated
Dec 11, 2017
Description
This dataset contains the ICD-10 code lists used to test the sensitivity and specificity of the Clinical Practice Research Datalink (CPRD) medical code lists for dementia subtypes. The provided code lists are used to define dementia subtypes in linked data from the Hospital Episode Statistic (HES) inpatient dataset and the Office of National Statistics (ONS) death registry, which are then used as the 'gold standard' for comparison against dementia subtypes defined using the CPRD medical code lists. The CPRD medical code lists used in this comparison are available here: Venexia Walker, Neil Davies, Patrick Kehoe, Richard Martin (2017): CPRD codes: neurodegenerative diseases and commonly prescribed drugs. https://doi.org/10.5523/bris.1plm8il42rmlo2a2fqwslwckm2 Complete download (zip, 3.9 KiB)
MarketScan Medicare Supplemental
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2025). MarketScan Medicare Supplemental [Dataset]. http://doi.org/10.57761/vyp5-jj62
Explore at:
spss, application/jsonl, arrow, parquet, csv, stata, sas, avroAvailable download formats
Unique identifier
https://doi.org/10.57761/vyp5-jj62
Dataset updated
Jun 27, 2025
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Dec 31, 2006 - Jun 28, 2024
Description
Abstract

The MarketScan Medicare Supplemental Database provides detailed cost, use and outcomes data for healthcare services performed in both inpatient and outpatient settings.

It Include Medicare Supplemental records for all years, and Medicare Advantage records starting in 2020. This page also contains the MarketScan Medicare Lab Database starting in 2018.

Starting in 2026, there will be a data access fee for using the full dataset. Please refer to the 'Usage Notes' section of this page for more information.

Methodology

MarketScan Research Databases are a family of data sets that fully integrate many types of data for healthcare research, including:

De-identified records of more than 250 million patients (medical, drug and dental)

%3C!-- --%3E

Laboratory results

%3C!-- --%3E

Hospital discharges

%3C!-- --%3E

The MarketScan Databases track millions of patients throughout the healthcare system. The data are contributed by large employers, managed care organizations, hospitals, EMR providers and Medicare.

Usage

This page contains the MarketScan Medicare Database.

We also have the following on other pages:

The MarketScan Commercial Database

The MarketScan Dental Database

The MarketScan Benefit Plan Design data

MarketScan Redbook (The MarketScan Code Reference Book)

%3C!-- --%3E

**Starting in 2026, there will be a data access fee for using the full dataset **

(though the 1% sample will remain free to use). The pricing structure and other

**relevant information can be found in this **FAQ Sheet.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

support@stanfordphs.freshdesk.com for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Data Documentation

Data access is required to view this section.

Section 2

Metadata access is required to view this section.

Section 3

Metadata access is required to view this section.
cms-medicare
kaggle.com
zip
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). cms-medicare [Dataset]. https://www.kaggle.com/datasets/bigquery/cms-medicare
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 21, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

This dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.

Sample Query

How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.

“#standardSQL SELECT MTV_AVG_HOSPITAL_RATING, US_AVG_HOSPITAL_RATING FROM ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE city = 'MOUNTAIN VIEW' AND state = 'CA' AND hospital_overall_rating <> 'Not Available') MTV JOIN ( SELECT ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING FROM bigquery-public-data.cms_medicare.hospital_general_info WHERE hospital_overall_rating <> 'Not Available') ON 1 = 1”

What are the most common diseases treated at hospitals that do well in the category of patient readmissions? For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in. “#standardSQL SELECT drg_definition, SUM(total_discharges) total_discharge_per_drg FROM bigquery-public-data.cms_medicare.hospital_general_info gi INNER JOIN bigquery-public-data.cms_medicare.inpatient_charges_2015 ic ON gi.provider_id = ic.provider_id WHERE readmission_national_comparison = 'Above the national average' GROUP BY drg_definition ORDER BY total_discharge_per_drg DESC LIMIT 10;”
h
medwikidataset
huggingface.co
Updated Sep 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shankar Subramony (2024). medwikidataset [Dataset]. https://huggingface.co/datasets/shankarsubramony/medwikidataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2024
Authors
Shankar Subramony
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Dataset Card for [Dataset Name]

Dataset Summary

This data set contains over 6,000 medical terms and their wikipedia text. It is intended to be used on a downstream task that requires medical terms and their wikipedia explanation.

Dataset Structure Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation Curation Rationale

[More… See the full description on the dataset page: https://huggingface.co/datasets/shankarsubramony/medwikidataset.
Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...
springernature.figshare.com
application/csv
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal (2024). A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [Dataset]. http://doi.org/10.6084/m9.figshare.26133973.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26133973.v1
Dataset updated
Sep 24, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.

We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].

Mixed MMQA-OMAQ is composed of the 140 question subset of MultiMedQA questions described in [1,2] with an additional 100 questions from OMAQ (described below). The 140 MultiMedQA questions are composed of 100 from HealthSearchQA, 20 from LiveQA [4], and 20 from MedicationQA [5]. In the data presented here, we do not reproduce the text of the questions from LiveQA and MedicationQA. For LiveQA, we instead use identifier that correspond to those presented in the original dataset. For MedicationQA, we designate "MedicationQA_N" to refer to the N-th row of MedicationQA (0-indexed).

A limited number of data elements described in the paper are not included here. The following elements are excluded:

The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.

The free-text comments written by raters during the ratings process.

Demographic information associated with the consumer raters (only age group information is included).

References

Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).

Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2

Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z

Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.

Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.

Description of files and sheets

Independent Ratings [ratings_independent.csv]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence with three possible values (No bias, Minor bias, Severe bias). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.

Paired Ratings [ratings_pairwise.csv]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias). Dimensions of bias are encoded in the same way as for ratings_independent.csv. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.

Counterfactual Paired Ratings [ratings_counterfactual.csv]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff, how_answers_diff). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.

Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].

Equity in Health AI (EHAI) [equitymedqa_ehai.csv]: Contains questions that compose the EHAI dataset.

Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv]: Contains questions that compose the FBRT-Manual dataset.

Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv]: Contains questions that compose the extended FBRT-LLM dataset.

Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.

TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv]: Contains questions that compose the TRINDS dataset.

Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv]: Contains pairs of questions that compose the CC-Manual dataset.

Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv]: Contains pairs of questions that compose the CC-LLM dataset.

HealthSearchQA [other_datasets_healthsearchqa.csv]: Contains questions sampled from the HealthSearchQA dataset [1,2].

Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq]: Contains questions that compose the Mixed MMQA-OMAQ dataset.

Omiye et al. [other datasets_omiye_et_al]: Contains questions proposed in Omiye et al. [3].

Version history

Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)

WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.

NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.
Medical Care Cost Recovery National Database (MCCR NDB)
catalog.data.gov
data.va.gov
+6more
Updated Apr 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Veterans Affairs (2021). Medical Care Cost Recovery National Database (MCCR NDB) [Dataset]. https://catalog.data.gov/dataset/medical-care-cost-recovery-national-database-mccr-ndb
Explore at:
Dataset updated
Apr 25, 2021
Dataset provided by
United States Department of Veterans Affairshttp://va.gov/
Description
The Medical Care Cost Recovery National Database (MCCR NDB) provides a repository of summary Medical Care Collections Fund (MCCF) billing and collection information used by program management to compare facility performance. It stores summary information for Veterans Health Administration (VHA) receivables including the number of receivables and their summarized status information. This database is used to monitor the status of the VHA's collection process and to provide visibility on the types of bills and collections being done by the Department. The objective of the VA MCCF Program is to collect reimbursement from third party health insurers and co-payments from certain non-service-connected (NSC) Veterans for the cost of medical care furnished to Veterans. Legislation has authorized VHA to: submit claims to and recover payments from Veterans' third party health insurance carriers for treatment of non-service-connected conditions; recover co-payments from certain Veterans for treatment of non-service-connected conditions; and recover co-payments for medications from certain Veterans for treatment of non-service-connected conditions. All of the information captured in the MCCR NDB is derived from the Accounts Receivable (AR) modules running at each medical center. MCCR NDB is not used for official collections figures; instead, the Department uses the Financial Management System (FMS).
i
Maternal Health Claims by Recipient County - Dataset - The Indiana Data Hub
hub.mph.in.gov
Updated Aug 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Maternal Health Claims by Recipient County - Dataset - The Indiana Data Hub [Dataset]. https://hub.mph.in.gov/dataset/maternal-health-claims-by-recipient-county
Explore at:
Dataset updated
Aug 30, 2019
Description
Archived as of 5/30/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to mothers with a live birth during the time period 07/2016 to 07/2020 and their claims 2 years prior and 2 years post delivery. It contains information about the overall number of claims and overall total dollar amount, total claims prebirthing and postbirthing, and total dollar amount prebirthing and postbirthing, by mother’s county of residence at the time of delivery. Maternal health claims are defined as claims with mothers diagnosed with at least one of the following ICD codes: 650, V270, V272, V273, V275, V276, V3000, V3100, V3200, V3300, V3400, V3500, V3600, V3700, V3900, O80, Z370, Z372, Z373, Z3750, Z3751, Z3752, Z3753, Z3754, Z3759, Z3760, Z3761, Z3762, Z3763, Z3764, Z3769, Z3800, Z382, Z385, Z3830, Z3830, Z3861, Z3863, Z3865, Z3868, Z388, V7242, V220, V239, V221, V222, V230, V232, V234, V2341, V2342, V724, V237, V279, V6511, V241, V242, V251, V723, V762, Z37, Z370, Z371, Z372, Z373, Z374, Z375, Z3750, Z3751, Z3752, Z3753, Z3754, Z3759, Z376, Z3760, Z3761, Z3762, Z3763, Z3764, Z3769, Z377, Z379, Z34, Z340, Z3400, Z3401, Z3402, Z3403, Z348, Z3480, Z3481, Z3482, Z3483, Z349, Z3490, Z3491, Z3492, Z3493, O09, O090, O0900, O0901, O0902, O0903, O091, O0910, O0911, O0912, O0913, O09A, O09A0, O09A1, O09A2, O09A3, O092, O0921, O09211, O09212, O09213, O09219, O0929, O09291, O09292, O09293, O09299, O093, O0930, O0931, O0932, O0933, O094, O0940, O0941, O0942, O0943, O095, O0951, O09511, O09512, O09513, O09519, O0952, O09521, O09522, O09523, O09529, O096, O0961, O09611, O09612, O09613, O09619, O0962, O09621, O09622, O09623, O09629, O097, O0970, O0971, O0972, O0973, O098, O0981, O09811, O09812, O09813, O09819, O0982, O09821, O09822, O09823, O09829, O0989, O09891, O09892, O09893, O09899, O099, O0990, O0991, O0992, O0993. A maternal health claim is also defined as claims with at least one of the following CPT codes: 59025, 59424, 59425, 59426, 76818, 88291, 59400, 59409, 59410, 59510, 59514, 59515, 59610, 59612, 59614, 59618, 59620, 59622, 57170, 58300, 59430, 88141, 88142, 88143, 88147, 88148, 88150, 88152, 88153, 88154, 88155, 88164, 88165, 88166, 88167, 88174, 88175. Prebirthing is restricted to claims with a service date within 2 years before to the delivery date of the child. Postbirthing is restricted to claims with a service date within 2 years after the delivery date of the child. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA.
f
DataSheet1_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes (2023). DataSheet1_Data Sources for Drug Utilization Research in Brazil—DUR-BRA Study.xlsx [Dataset]. http://doi.org/10.3389/fphar.2021.789872.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2021.789872.s001
Dataset updated
Jun 15, 2023
Dataset provided by
Frontiers
Authors
Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brazil
Description
Background: In Brazil, studies that map electronic healthcare databases in order to assess their suitability for use in pharmacoepidemiologic research are lacking. We aimed to identify, catalogue, and characterize Brazilian data sources for Drug Utilization Research (DUR).Methods: The present study is part of the project entitled, “Publicly Available Data Sources for Drug Utilization Research in Latin American (LatAm) Countries.” A network of Brazilian health experts was assembled to map secondary administrative data from healthcare organizations that might provide information related to medication use. A multi-phase approach including internet search of institutional government websites, traditional bibliographic databases, and experts’ input was used for mapping the data sources. The reviewers searched, screened and selected the data sources independently; disagreements were resolved by consensus. Data sources were grouped into the following categories: 1) automated databases; 2) Electronic Medical Records (EMR); 3) national surveys or datasets; 4) adverse event reporting systems; and 5) others. Each data source was characterized by accessibility, geographic granularity, setting, type of data (aggregate or individual-level), and years of coverage. We also searched for publications related to each data source.Results: A total of 62 data sources were identified and screened; 38 met the eligibility criteria for inclusion and were fully characterized. We grouped 23 (60%) as automated databases, four (11%) as adverse event reporting systems, four (11%) as EMRs, three (8%) as national surveys or datasets, and four (11%) as other types. Eighteen (47%) were classified as publicly and conveniently accessible online; providing information at national level. Most of them offered more than 5 years of comprehensive data coverage, and presented data at both the individual and aggregated levels. No information about population coverage was found. Drug coding is not uniform; each data source has its own coding system, depending on the purpose of the data. At least one scientific publication was found for each publicly available data source.Conclusions: There are several types of data sources for DUR in Brazil, but a uniform system for drug classification and data quality evaluation does not exist. The extent of population covered by year is unknown. Our comprehensive and structured inventory reveals a need for full characterization of these data sources.
Medicare Physician & Other Practitioners - by Provider and Service
catalog.data.gov
data.virginia.gov
+3more
Updated Apr 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). Medicare Physician & Other Practitioners - by Provider and Service [Dataset]. https://catalog.data.gov/dataset/medicare-physician-other-practitioners-by-provider-and-service-b156e
Explore at:
Dataset updated
Apr 26, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
The Medicare Physician & Other Practitioners by Provider and Service dataset provides information on use, payments, and submitted charges organized by National Provider Identifier (NPI), Healthcare Common Procedure Coding System (HCPCS) code, and place of service. Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.

Data from: Measuring access to effective care among elderly medicare...

data.virginia.gov
healthdata.gov
+1more

html

Updated Jul 23, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institutes of Health (2025). Measuring access to effective care among elderly medicare enrollees in managed and fee-for-service care: a retrospective cohort study [Dataset]. https://data.virginia.gov/dataset/measuring-access-to-effective-care-among-elderly-medicare-enrollees-in-managed-and-fee-for-serv

Explore at:

htmlAvailable download formats

Dataset updated

Jul 23, 2025

Dataset provided by

National Institutes of Health

Description

Background Our aim was to compare access to effective care among elderly Medicare patients in a Staff Model and Group Model HMO and in Fee-for-Service (FFS) care.

   Methods
   We used a retrospective cohort study design, using claims and automated medical record data to compare achievement on quality indicators for elderly Medicare recipients. Secondary data were collected from 1) HMO data sets and 2) Medicare claims files for the time period 1994–95. All subjects were Medicare enrollees in a defined area of New England: those enrolled in two divisions of a managed care plan with different physician payment arrangements: a staff model, and a group model; and the Medicare FFS population. We abstracted information on indicators covering several domains: preventive, diagnosis-specific, and chronic disease care.


   Results
   On the indicators we created and tested, access in the single managed care plan under study was comparable to or better than FFS care in the same geographic region. Percent of Medicare recipients with breast cancer screening was 36 percentage points higher in the staff model versus FFS (95% confidence interval 34–38 percentage points). Follow up after hospitalization for myocardial infarction was 20 percentage points higher in the group model than in FFS (95% confidence interval 14–26 percentage points).


   Conclusion
   According to indicators developed for use in both claims and automated medical record data, access to care for elderly Medicare beneficiaries in one large managed care organization was as good as or better than that in FFS care in the same geographic area.

o
Bynum 1-Year Standard Method for identifying Alzheimer’s Disease and Related...
openicpsr.org
Updated Dec 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie Bynum (2022). Bynum 1-Year Standard Method for identifying Alzheimer’s Disease and Related Dementias (ADRD) in Medicare Claims data [Dataset]. http://doi.org/10.3886/E183523V3
Explore at:
Unique identifier
https://doi.org/10.3886/E183523V3
Dataset updated
Dec 13, 2022
Dataset provided by
Institute for Healthcare Policy and Innovation, University of Michigan
Authors
Julie Bynum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
USA
Description
Here, you will find resources to use the Bynum-Standard 1-Year Algorithm including a README file that accompanies SAS and Stata scripts for the 1-Year Standard Method for identifying Alzheimer’s Disease and Related Dementias (ADRD) in Medicare Claims data. There are seven script files (plus a parameters file for SAS [parm.sas]) for both SAS and Stata. The files are numbered in the order in which they should be run; the five “1” files may be run in any order.The full algorithm requires access to a single year of Medicare Claims data for (1) MedPAR, (2) Home Health Agency (HHA) Claims File, (3) Hospice Claims File, (4) Carrier Claims and Line Files, and (5) Hospital Outpatient File (HOF) Claims and Revenue Files. All Medicare Claims files are expected to be in SAS format (.sas7bdat).For each data source, the script will output three files*:Diagnosis-level file: Lists individual ADRD diagnoses for each beneficiary for a given visit. This file allows researchers to identify which ICD-9-CM or ICD-10-CM codes are used in the claims data.Service Date-level file: Aggregated from the Diagnosis-level file, this file includes all beneficiaries with an ADRD diagnosis by Service Date (date of a claim with at least one ADRD diagnosis).Beneficiary-level file: Aggregated from the Service Date-level file, this file includes all beneficiaries with at least one* ADRD diagnosis at any point in the year within a specific file* The algorithm combines the Carrier and HOF files at the Service Date-level. The final combined Carrier and HOF Beneficiary-level file includes those with at least two (2) claims that are seven (7) or more days apart.A final combined file is created by merging all Beneficiary-level files. This file is used to identify beneficiaries with ADRD and can be merged onto other files by the Beneficiary ID (BENE_ID).With appreciation & acknowledgement to colleagues from a grant funded by the NIA for their involvement in development & validation of the Bynum-Standard 1-Year Algorithm
CMS Program Statistics - Medicare Home Health Agency
catalog.data.gov
data.virginia.gov
+1more
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2025). CMS Program Statistics - Medicare Home Health Agency [Dataset]. https://catalog.data.gov/dataset/medicare-home-health-agency-1fa6a
Explore at:
Dataset updated
May 15, 2025
Dataset provided by
Centers for Medicare & Medicaid Services
Description
The Medicare Home Health Agency tables provide use and payment data for home health agencies. The tables include use and expenditure data from home health Part A (Hospital Insurance) and Part B (Medical Insurance) claims. For additional information on enrollment, providers, and Medicare use and payment, visit the CMS Program Statistics page. These data do not exist in a machine-readable format, so the view data and API options are not available. Please use the download function to access the data. Below is the list of tables: MDCR HHA 1. Medicare Home Health Agencies: Utilization and Program Payments for Original Medicare Beneficiaries, by Type of Entitlement, Yearly Trend MDCR HHA 2. Medicare Home Health Agencies: Utilization and Program Payments for Original Medicare Beneficiaries, by Demographic Characteristics and Medicare-Medicaid Enrollment Status MDCR HHA 3. Medicare Home Health Agencies: Utilization and Program Payments for Original Medicare Beneficiaries, by Area of Residence MDCR HHA 4. Medicare Home Health Agencies: Persons with Utilization and Total Service Visits for Original Medicare Beneficiaries, Type of Agency and Type of Service Visit MDCR HHA 5. Medicare Home Health Agencies: Persons with Utilization and Total Service Visits for Original Medicare Beneficiaries, by Type of Control and Type of Service Visit MDCR HHA 6. Medicare Home Health Agencies: Persons with Utilization, Total Service Visits, and Program Payments for Original Medicare Beneficiaries, by Number of Service Visits and Number of Episodes

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub [Dataset]. https://hub.mph.in.gov/dataset/claims-by-zip-code-and-category-of-services

Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub

Explore at:

Dataset updated

Sep 14, 2017

Description

Archived as of 6/26/2025: The datasets will no longer receive updates but the historical data will continue to be available for download. This dataset provides information related to the major services for patients. It contains information about the total number of patients, total number of claims, and dollar amount paid, grouped by recipient zip code. Restricted to claims with service date between 01/2012 to 12/2017. Service categories considered are: 01 - Inpatient Service 03 - Outpatient Service 06 - Physician Service 11 - Lab Service 12 - X-Ray Service 17 - Clinic Service 26 - Mental Health Service 27 - Dental Service/Child 28 - Dental Service/Adult 31 - Eye Care and Exams 38 - EPSDT Service Provider is billing provider. This data is for research purposes and is not intended to be used for reporting. Due to differences in geographic aggregation, time period considerations, and units of analysis, these numbers may differ from those reported by FSSA. Distance between recipient and provider is a straight-line distance calculated and not the physical distance.

Clear search

Close search

Google apps

Main menu

Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub...

Dataset for the paper: "Monant Medical Misinformation Dataset: Mapping...

Claims Servicing Mental Health Patient by Provider - Dataset - The Indiana...

HCPCS Level II

Context

Content

Acknowledgements

Inspiration

HCPCS Level II

Evaluating Health Home Care Quality

Evaluating Health Home Care Quality

CMS Core Set and Health Home SPA Measures

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Health Insurance Marketplaces

Health Insurance Marketplaces

Rates, Benefits, Coverage and Networks

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Claims Reimbursement to Health Care Providers and Facilities for Testing,...

CPRD codes: ICD-10 equivalent code lists for dementia subtypes - Datasets -...

MarketScan Medicare Supplemental

Abstract

Methodology

Usage

Before Manuscript Submission

Data Documentation

Section 2

Section 3

cms-medicare

Context

Querying BigQuery tables

Sample Query

medwikidataset

Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...

Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

References

Description of files and sheets

Version history

Medical Care Cost Recovery National Database (MCCR NDB)

Maternal Health Claims by Recipient County - Dataset - The Indiana Data Hub

DataSheet1_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...

Medicare Physician & Other Practitioners - by Provider and Service

Data from: Measuring access to effective care among elderly medicare...

Bynum 1-Year Standard Method for identifying Alzheimer’s Disease and Related...

CMS Program Statistics - Medicare Home Health Agency

Claims by Zip Code and Category of Services - Dataset - The Indiana Data HubSee More Versions

Claims by Zip Code and Category of Services - Dataset - The Indiana Data Hub