Facebook
TwitterThis dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.
How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.
“#standardSQL
SELECT
MTV_AVG_HOSPITAL_RATING,
US_AVG_HOSPITAL_RATING
FROM (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
city = 'MOUNTAIN VIEW'
AND state = 'CA'
AND hospital_overall_rating <> 'Not Available') MTV
JOIN (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
hospital_overall_rating <> 'Not Available')
ON
1 = 1”
What are the most common diseases treated at hospitals that do well in the category of patient readmissions?
For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in.
“#standardSQL
SELECT
drg_definition,
SUM(total_discharges) total_discharge_per_drg
FROM
bigquery-public-data.cms_medicare.hospital_general_info gi
INNER JOIN
bigquery-public-data.cms_medicare.inpatient_charges_2015 ic
ON
gi.provider_id = ic.provider_id
WHERE
readmission_national_comparison = 'Above the national average'
GROUP BY
drg_definition
ORDER BY
total_discharge_per_drg DESC
LIMIT
10;”
Facebook
TwitterThis data package contains claims-based data about beneficiaries of Medicare program services including Inpatient, Outpatient, related to Chronic Conditions, Skilled Nursing Facility, Home Health Agency, Hospice, Carrier, Durable Medical Equipment (DME) and data related to Prescription Drug Events. It is necessary to mention that the values are estimated and counted, by using a random sample of fee-for-service Medicare claims.
Facebook
TwitterThe Nursing Home COVID-19 Public File from the Centers for Medicare & Medicaid Services, filtered for Connecticut. View the full dataset and detailed metadata here. The Nursing Home COVID-19 Public File includes data reported by nursing homes to the CDC’s National Healthcare Safety Network (NHSN) system COVID-19 Long Term Care Facility Module, including Resident Impact, Facility Capacity, Staff & Personnel, and Supplies & Personal Protective Equipment, and Ventilator Capacity and Supplies Data Elements.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.
Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.
The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.
Fork this kernel to get started.
https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research
Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @rawpixel from Unplash.
What are the top ten most common types of physicians in Mountain View?
What are the names and phone numbers of dentists in California who studied public health?
Facebook
TwitterCompares health coverage, medicaid coverage, and IHS access in top 20 most populous AIAN states, which comprise 93% of total AIAN population.
This is a dataset hosted by the Centers for Medicare & Medicaid Services (CMS). The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore CMS's Data using Kaggle and all of the data sources available through the CMS organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by John Fornander on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
This dataset is distributed under NA
Facebook
TwitterIn the May 2020 CMS Interoperability and Patient Access final rule, CMS finalized the policy to publicly report the names and NPIs of those providers who do not have digital contact information included in the NPPES system (85 FR 25584). This data includes the NPI and provider name of providers and clinicians without digital contact information in NPPES.
Facebook
TwitterThis data package contains free-standing Home Health Agencies Medicare cost reports by fiscal year, released annually by the Centers for Medicare and Medicaid Services (CMS). The datasets contain the highest level of Medicare cost report status.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Medicare Claims Synthetic Public Use Files (SynPUFs)
Medicare Claims Synthetic Public Use Files (SynPUFs) were created to allow interested parties to gain familiarity using Medicare claims data while protecting beneficiary privacy. The data structure of the Medicare SynPUFs is very similar to the CMS Limited Data Sets, but with a smaller number of variables. They provide data analysts and software developers the opportunity to develop programs and products utilizing the identical formats and variable names as those which appear in the actual CMS data files. The files have been designed so that programs and procedures created on the SynPUFs will function on CMS Limited Data Sets. The SynPUFs also provide a robust set of metadata on the CMS claims data that have not been available in the public domain. After developmental work has been completed potential users should be much better informed about which CMS data products they would need to acquire to fulfill their analytic needs.
These files may be used to:
allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data; train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and, support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy. Although these files have very limited inferential research value to draw conclusions about Medicare beneficiaries due to the synthetic processes used to create the files, they increase access to realistic Medicare claims data files in a timely and less expensive manner to spur the innovation necessary to achieve the goals of better care for beneficiaries and improve the health of the population.
Files will be made available as a free downloads in order to provide access to Medicare data without the time and cost associated with obtaining data files which require more restricted access.
The first Synthetic PUF released is the 2008-2010 Data Entrepreneurs’ SynPUF.
This data is published on the CMS website - https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs
Facebook
TwitterThis is a Medicare dataset released by the Center for Medicare and Medicaid Services (CMS) and accessed via BigQuery. All data is from 2014.
For more information regarding the CMS data, click here.
From BigQuery:
This public dataset was created by the Centers for Medicare & Medicaid Services. The data summarizes the utilization and payments for procedures, services, and prescription drugs provided to Medicare beneficiaries by specific inpatient and outpatient hospitals, physicians, and other suppliers. The dataset includes the following data - common inpatient and outpatient services, all physician and other supplier procedures and services, and all Part D prescriptions.
Providers determine what they will charge for items, services, and procedures provided to patients and these charges are the amount that providers bill for an item, service, or procedure.
Facebook
TwitterVerify the accuracy of SSNs of all individual Medicare providers, owners, managing/directing employees, authorized representatives, ambulance service medical directors, ambulance crew members, technicians, chain organization administrators, Independent Diagnostic Test Facility (IDTF), supervising/directing physicians, and IDTF interpretation service providers. Also included in this Agreement are individual health care providers who apply for a National Provider Identification Number (NPI).
Facebook
TwitterThe Medicare COVID-19 Hospitalization Trends dataset contains aggregate information from Medicare Fee-for-Service claims, Medicare Advantage encounter, and Medicare enrollment data. It provides insight around the groups of beneficiaries that were hospitalized at different points during the pandemic. CMS publicly released the first Preliminary Medicare COVID-19 Snapshot in June 2020 during the early stages of the Public Health Emergency for COVID-19. That report focused on COVID-19 cases and hospitalizations data for Medicare beneficiaries with a COVID-19 diagnosis. Throughout 2020 and 2021, that report was subsequently updated with refreshed data 13 times. Beginning in October 2021, CMS shifted its public COVID-19 reporting away from cumulative case and hospitalization rates to hospitalization trends over time with the release of this report, the Medicare COVID-19 Hospitalization Trends Report. All prior releases of both the Preliminary Medicare COVID-19 Snapshot and the Medicare COVID-19 Hospitalization Trends Report are available for download in the Medicare COVID-19 Data - Prior Releases file.
Facebook
TwitterThe DE SynPUF is built from a 5 percent random sample of Medicare beneficiaries in 2008 and their claims from 2008 through 2010. The DE SynPUF contains five types of data, Beneficiary Summary, Inpatient Claims, Outpatient Claims, Carrier Claims, and Prescription Drug Events. Each file contains the same variables across years.
Facebook
Twitter2003 forward. CMS compiles claims data for Medicare and Medicaid patients across a variety of categories and years. This includes Inpatient and Outpatient claims, Master Beneficiary Summary Files, and many other files. Indicators from this data source have been computed by personnel in CDC's Division for Heart Disease and Stroke Prevention (DHDSP). This is one of the datasets provided by the National Cardiovascular Disease Surveillance System. The system is designed to integrate multiple indicators from many data sources to provide a comprehensive picture of the public health burden of CVDs and associated risk factors in the United States. The data are organized by location (national and state) and indicator. The data can be plotted as trends and stratified by sex and race/ethnicity.
Facebook
TwitterCMS Part D Prescriber (publicly available)
The table Part D Prescriber is part of the dataset Medicare Public, available at https://stanford.redivis.com/datasets/762z-7mbwjnm8e. It contains 24121659 rows across 19 variables.
Facebook
TwitterIn fall 2014, the Center for Medicaid and CHIP Services (CMCS) conducted a Nationwide Adult Medicaid (NAM) Consumer Assessment of Healthcare Providers and Systems (CAHPS) survey of Medicaid enrollees to attain national and state-by-state measures of access, barriers to care, and experiences with care across delivery systems and major population subgroups. The survey interviewed a representative sample of adults ages 18 and older enrolled in Medicaid during October through December 2013. Additional information, including a data dictionary and analysis guidance and downloadable SAS files are available on the NAM CAHPS webpage. Please note that all analyses must account for the survey’s sample design and use weights and strata. Sample code is available in on the NAM CAHPS webpage.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset, originally from CMS contains a range of information, including diagnoses, procedures, prescriptions, and financial data, among others. It represents ~ 5% of the whole CMS SynPUF data and has 55M claims - but it is too big to open with pandas in a Jupyter notebook. You can limit it to a smaller nrows (eg: , nrows=**20000000**) while pd.read_csv
Columns description: RangeIndex: 20000000 entries, 0 to 19999999 Data columns (total 43 columns): # Column Dtype
0 DESYNPUF_ID object
1 BENE_BIRTH_DT int64
2 BENE_DEATH_DT int64
3 BENE_SEX_IDENT_CD int64
4 BENE_RACE_CD int64
5 BENE_ESRD_IND object
6 SP_STATE_CODE int64
7 BENE_COUNTY_CD int64
8 BENE_HI_CVRAGE_TOT_MONS int64
9 BENE_SMI_CVRAGE_TOT_MONS int64
10 BENE_HMO_CVRAGE_TOT_MONS int64
11 PLAN_CVRG_MOS_NUM int64
12 SP_ALZHDMTA int64
13 SP_CHF int64
14 SP_CHRNKIDN int64
15 SP_CNCR int64
16 SP_COPD int64
17 SP_DEPRESSN int64
18 SP_DIABETES int64
19 SP_ISCHMCHT int64
20 SP_OSTEOPRS int64
21 SP_RA_OA int64
22 SP_STRKETIA int64
23 MEDREIMB_IP float64
24 BENRES_IP float64
25 PPPYMT_IP float64
26 MEDREIMB_OP float64
27 BENRES_OP float64
28 PPPYMT_OP float64
29 MEDREIMB_CAR float64
30 BENRES_CAR float64
31 PPPYMT_CAR float64
32 CLM_ID int64
33 CLM_FROM_DT int64
34 CLM_THRU_DT int64
35 ICD9_DGNS_CD_1 object
36 PRF_PHYSN_NPI_1 float64
37 HCPCS_CD_1 object
38 LINE_NCH_PMT_AMT_1 float64
39 LINE_BENE_PTB_DDCTBL_AMT_1 float64
40 LINE_COINSRNC_AMT_1 float64
41 LINE_PRCSG_IND_CD_1 object
42 LINE_ICD9_DGNS_CD_1 object
dtypes: float64(13), int64(24), object(6)
memory usage: 6.4+ GB
The 20M claims represent ~ 141k unique individuals with ~12k unique ICD9 diagnoses and 7k unique HCPCS (procedure codes)
Facebook
TwitterConsumer Assessment of Healthcare Providers and Systems (CAHPS) for PQRS measure performance rates reported by groups.
Facebook
TwitterThe Centers for Medicare & Medicaid Services (CMS) Provider of Services (POS) file contains data on characteristics of hospitals and other types of healthcare facilities, including the name and address of the facility and the type of Medicare services the facility provides, among other information.
The Other file contains information on non-CLIA facilities, such as hospitals, nursing facilities, home health agencies, and more (18 categories)
Facebook
TwitterDuring a public health emergency in the Families First Coronavirus Response Act (FFCRA), a new optional Medicaid eligibility group was added called COVID-19 testing eligibility group. States reported these expenditures under sections 6004 and 6008 through the Medicaid Budget and Expenditure System (MBES) on the Form CMS-64. The data in these reports constitute summary level preliminary expenditure information related to these FFCRA provisions for each state Notes: 1. The Families First Coronavirus Response Act (FFCRA), enacted on March 18, 2020, provided a temporary FMAP increase to states and territories meeting certain qualifications and added a new optional Medicaid eligibility group for uninsured individuals during a public health emergency in section 1902(a)(10)(A)(ii)(XXIII) of the Act, referred to as the “COVID - 19 Testing Group.” 2. FFCRA Section 6008 provides a temporary 6.2 percentage point FMAP increase to each qualifying state and territory's FMAP under section 1905(b) of the Act, beginning January 1, 2020 and lasting through the end of the quarter in which the public health emergency (PHE) declared by the Secretary for COVID-19 ends, including any extensions. 3. FFCRA Section 6004 provides a 100 percent match rate for individuals eligible under the new optional Medicaid eligibility group in section 1902(a)(10)(A)(ii)(XXIII) of the Act, beginning no earlier than March 18, 2020 and lasting through the end of the PHE for COVID-19. 4. States that have reported “0” either have no expenditures for that reporting category or have not yet reported expenditures for that category. 5. This report is a cumulative summary report that includes current and prior period adjustment expenditures that apply to this quarter 6. For the Quarter ending 03/31/2020: Delaware has Negative Total Computable Expenditures and Total Federal Share Expenditures due to the reporting of prior period adjustments during this period. 7. For the Quarter ending 09/30/2020: Colorado has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 8. For the Quarter ending 03/31/2021: California has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. This corrected FY 2020 Q4 expenditures for Treatment services that are not allowed for Section 6004 100% FMAP match. 9. For the Quarter ending 03/31/2021: Utah has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 10. For the Quarter ending 12/31/2022: California has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 11. For the Quarter ending 12/31/2022: Connecticut has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 12. For the Quarter ending 09/30/2023: Connecticut has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 13. For the Quarter ending 09/30/2023: Illinois has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid 19 Expenditures due to the reporting of prior period adjustments during this period. 14. For the Quarter ending 09/30/2023: Minnesota has Negative Total Computable Section 6004 Covid 19 Expenditures and Total Federal Share Section 6004 Covid
Facebook
TwitterThis de-identified dataset contains details 100% of the prescription drug claims made by Medicare beneficiaries during the year of release. This dataset is part of the Public Use Files (PUFs) released by the Centers for Medicare & Medicaid Services (CMS), which are public domain de-identified data files available for research with claim-specific information. The purpose of these files is to provide information while protecting confidentiality.
Facebook
TwitterThis dataset contains Hospital General Information from the U.S. Department of Health & Human Services. This is the BigQuery COVID-19 public dataset. This data contains a list of all hospitals that have been registered with Medicare. This list includes addresses, phone numbers, hospital types and quality of care information. The quality of care data is provided for over 4,000 Medicare-certified hospitals, including over 130 Veterans Administration (VA) medical centers, across the country. You can use this data to find hospitals and compare the quality of their care
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.cms_medicare.hospital_general_info.
How do the hospitals in Mountain View, CA compare to the average hospital in the US? With the hospital compare data you can quickly understand how hospitals in one geographic location compare to another location. In this example query we compare Google’s home in Mountain View, California, to the average hospital in the United States. You can also modify the query to learn how the hospitals in your city compare to the US national average.
“#standardSQL
SELECT
MTV_AVG_HOSPITAL_RATING,
US_AVG_HOSPITAL_RATING
FROM (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS MTV_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
city = 'MOUNTAIN VIEW'
AND state = 'CA'
AND hospital_overall_rating <> 'Not Available') MTV
JOIN (
SELECT
ROUND(AVG(CAST(hospital_overall_rating AS int64)),2) AS US_AVG_HOSPITAL_RATING
FROM
bigquery-public-data.cms_medicare.hospital_general_info
WHERE
hospital_overall_rating <> 'Not Available')
ON
1 = 1”
What are the most common diseases treated at hospitals that do well in the category of patient readmissions?
For hospitals that achieved “Above the national average” in the category of patient readmissions, it might be interesting to review the types of diagnoses that are treated at those inpatient facilities. While this query won’t provide the granular detail that went into the readmission calculation, it gives us a quick glimpse into the top disease related groups (DRG)
, or classification of inpatient stays that are found at those hospitals. By joining the general hospital information to the inpatient charge data, also provided by CMS, you could quickly identify DRGs that may warrant additional research. You can also modify the query to review the top diagnosis related groups for hospital metrics you might be interested in.
“#standardSQL
SELECT
drg_definition,
SUM(total_discharges) total_discharge_per_drg
FROM
bigquery-public-data.cms_medicare.hospital_general_info gi
INNER JOIN
bigquery-public-data.cms_medicare.inpatient_charges_2015 ic
ON
gi.provider_id = ic.provider_id
WHERE
readmission_national_comparison = 'Above the national average'
GROUP BY
drg_definition
ORDER BY
total_discharge_per_drg DESC
LIMIT
10;”