This dataset was created by Adrian Chan
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets are a subset of the CMS Open data with 2021 data-taking conditions for education purposes. The files are in CSV and PKL formats (only use one of those) and contain two datasets:
- Data files, starting with output_data_CMS_Run2012B, correspond to 4429.37 /pb of data collected by the CMS Experiment. They are a subset of the dataset on reference [1].
- Simulation files, starting with output_sim_CMS_MonteCarlo2012, are a subset of the dataset referenced on [2]. The number of generated events in this case is 30458871, and the cross section is 3503.71.
All the files were processed with a modified version of the AOD2NanoAODOutreachTool [3]. The small modifications are related to the number of triggers stored, and some objects like taus were removed.
--------------------------------------------------------
[1] CMS collaboration (2017). DoubleMuParked primary dataset in AOD format from Run of 2012 (/DoubleMuParked/Run2012B-22Jan2013-v1/AOD). CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.YLIC.86ZZ
[2] Wunsch, Stefan; (2019). DYJetsToLL dataset in reduced NanoAOD format for education and outreach. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.SRRA.2GON
[3] https://github.com/cms-opendata-analyses/AOD2NanoAODOutreachTool
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.
Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.
The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.
Fork this kernel to get started.
https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research
Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @rawpixel from Unplash.
What are the top ten most common types of physicians in Mountain View?
What are the names and phone numbers of dentists in California who studied public health?
The Medicare Outpatient Hospitals by Provider and Service dataset provides information on services for Original Medicare Part B beneficiaries by OPPS hospitals. These datasets contain information on the number of services, payments, and submitted charges organized by provider CMS Certified Number (CCN) and comprehensive Ambulatory Payment Classification (APC).
The Minimum Data Set (MDS) Frequency data summarizes health status indicators for active residents currently in nursing homes. The MDS is part of the Federally-mandated process for clinical assessment of all residents in Medicare and Medicaid certified nursing homes. This process provides a comprehensive assessment of each resident's functional capabilities and helps nursing home staff identify health problems. Care Area Assessments (CAAs) are part of this process, and provide the foundation upon which a resident's individual care plan is formulated. MDS assessments are completed for all residents in certified nursing homes, regardless of source of payment for the individual resident. MDS assessments are required for residents on admission to the nursing facility, periodically, and on discharge. All assessments are completed within specific guidelines and time frames. In most cases, participants in the assessment process are licensed health care professionals employed by the nursing home. MDS information is transmitted electronically by nursing homes to the national MDS database at CMS. When reviewing the MDS 3.0 Frequency files, some common software programs e.g., ‘Microsoft Excel’ might inaccurately strip leading zeros from designated code values (i.e., "01" becomes "1") or misinterpret code ranges as dates (i.e., O0600 ranges such as 02-04 are misread as 04-Feb). As each piece of software is unique, if you encounter an issue when reading the CSV file of Frequency data, please open the file in a plain text editor such as ‘Notepad’ or ‘TextPad’ to review the underlying data, before reaching out to CMS for assistance.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.
To help get you started, here are some data exploration ideas:
See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!
This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.
Here, we've processed the data to facilitate analytics. This processed version has three components:
The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.
In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:
Additionally, there are two CSV files that facilitate joining data across years:
The "database.sqlite" file contains tables corresponding to each of the processed CSV files.
The code to create the processed version of this data is available on GitHub.
This file contains events from the MultiJet primary dataset from the CMS open data release, and computes the razor variables MR and Rsq, used in supersymmetric particle searches. More details on the razor variables can be found in Phys. Rev. D 90, 112001
Please be advised that as of Q4 2023 there is a new Provider of Service file (POS) that contains the provider and certification details for Home Health Agencies (HHAs), Hospices, and Ambulatory Surgical Centers (ASCs). Data contained in this file are extracted from the Internet Quality Improvement and Evaluation System (iQIES) environment and will be updated quarterly along with the other two POS files. The Provider of Services File - Hospital & Non-Hospital Facilities data provide critical resources for other federal regulator requirements as well as supports the ongoing quality & research efforts sponsored by CMS. In this file you will find provider certification, termination, accreditation, ownership, name, location and other characteristics organized by CMS Certification Number.
This dataset characterizes canopy heights of mangrove-forested wetlands globally for 2015 at 12-m resolution. Estimates of maximum canopy height (height of the tallest tree) were derived from the German Space Agency's TanDEM-X data that produced global digital surface models. Also provided are Lidar estimates of canopy height based on the GEDI instrument, which were used for training and validation of the TanDEM-X estimates of forest height. The coverage of these data follows Global Mangrove Watch's mangrove extent maps. These spatially explicit maps of mangrove canopy height can be used to assess local-scale geophysical and environmental conditions that may regulate forest structure and carbon cycle dynamics. Maps revealed a wide range of canopy heights, including maximum values (>60 m) that surpass maximum heights of other forest types. Maps are provided in cloud optimized GeoTIFF format, and mangrove heights for individual GEDI tiles are compiled in a comma separated values (CSV) files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information on Skilled Nursing Facilities currently enrolled in Medicare.
This dataset provides dissolved carbon (dissolved inorganic carbon and dissolved organic carbon), greenhouse gases, dissolved organic matter optical, and hydrological (water temperature, pH, alkalinity, dissolved oxygen) data collected from the Shark and Harney tidal rivers in the Everglades, Florida, USA. The data were collected as part of the NASA Carbon Monitoring System (CMS) BlueFlux field campaigns over the 2022 wet season (October 2022) and 2023 dry season (March 2023). Data includes single-collection samples collected from sites along both rivers and samples collected by an autosampler at one site over multiple tidal cycles. The data are provided in comma-separated values (.csv) format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Accountable Care Organizations data provides information on ACOs participating in the Medicare Shared Savings Program (Shared Savings Program), including their name, track status, number of years in the program, and contact information for key personnel.
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
This dataset contains half-hourly ground solar-induced chlorophyll fluorescence (SIF) and vegetation indices including NDVI, EVI, Red edge chlorophyll index, green chlorophyll index, and photochemical reflectance index at seven crop sites in Nebraska and Illinois for the period 2016-2021. Four sites were located at Eddy Covariance (EC) tower sites (sites US-Ne2, US-Ne3, US-UiB, and US-UiC), and three sites were located on private farms (sites Reifsteck, Rund, and Reinhart). The sites were either miscanthus, corn-soybean rotation or corn-corn-soybean rotation. The spectral data for SIF retrieval and hyperspectral reflectance for vegetation index calculation were collected by the FluoSpec2 system, installed near planting, and uninstalled after harvest to collect whole growing-season data. Raw nadir SIF at 760 nm from different algorithms (sFLD, 3FLD, iFLD, SFM) are included. SFM_nonlinear and SFM_linear represent the Spectral fitting method (SFM) with the assumption that fluorescence and reflectance change with wavelength non-linearly and linearly, respectively. Additional data include two SIF correction factors including calibration coefficient adjustment factor (f_cal_corr_QEPRO) and upscaling nadir SIF to eddy covariance footprint factor (ratio_EC footprint, SIF pixel), and measured FPAR from quantum sensors and Rededge NDVI calculated FPAR. The data are provided in comma-separated values (CSV) format.
This dataset provides gridded average annual wetland salinity concentrations in practical salinity units (PSU) at 30-meter resolution within 24 coastal estuary sites in the United States predicted for 2020. Salinity in estuaries can serve as a proxy for sulfate concentration, which can inhibit methanogenesis. Data were derived from a hybrid approach to mapping salinity as a continuous variable using a combination of physical watershed and stream characteristics, optical remote sensing based on vegetation characteristics, and climate variables. Data are provided in cloud-optimized GeoTIFF format covering 33 Hydrologic Unit Code 8-digit (HUC8) watersheds to the extent of palustrine and estuarine wetlands as defined by NOAA's 2016 Coastal Change Analysis Program (C-CAP) Coastal Land Cover layer. Additionally, model outputs are provided in comma separated values (CSV) files, and code scripts are provided in a compressed (*.zip) file.
The Hospital Provider Cost Report dataset provides select measures from the hospital annual cost report. This data includes provider information such as facility characteristics, utilization data, cost and charges by cost center (in total and for Medicare), Medicare settlement data, and financial statement data organized by CMS Certification Number.
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Input files for the CMS masterclass (from 2024) as part of the IPPOG International Masterclasses.
The data are selected events detected and reconstructed by the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN. The events were produced in proton-proton collisions during Run 1 of the LHC.
Each zip file is denoted by a number N which is the approximate size masterclass appropriate for these files. For example, N5 will have 5 separate datasets, suitable for a masterclass of 5-10 students. The datasets are a mix of 1-lepton, 2-lepton, and 4-lepton events. The .ig files are event display files used in the iSpy WebGL event display. The .csv files give kinematic and parent particle information for the input events.
The Medicare Part D Prescribers by Provider and Drug dataset provides information on prescription drugs prescribed to Medicare beneficiaries enrolled in Part D by physicians and other health care providers. This dataset contains the total number of prescription fills that were dispensed and the total drug cost paid organized by prescribing National Provider Identifier (NPI), drug brand name (if applicable) and drug generic name.
Note: This full dataset contains more records than most spreadsheet programs can handle, which will result in an incomplete load of data. Use of a database or statistical software is required.
This dataset provides 10-minute fire emissions within 0.1-degree regularly spaced intervals across Indonesia from July 2015 to December 2020. The dataset was produced with a top-down approach based on fire radiative energy (FRE) and smoke aerosol emission coefficients (Ce) derived from multiple new-generation satellite observations. Specifically, the Ce values of peatland, tropical forest, cropland, or savanna and grassland were derived from fire radiative power (FRP) and emission rates of smoke aerosols based on Visible Infrared Imaging Radiometer Suite (VIIRS) active fire and aerosol products. FRE for each 0.1-degree interval was calculated from the diurnal FRP cycle that was reconstructed by fusing cloud-corrected FRP retrievals from the high temporal-resolution (10 mins) Himawari-8 Advanced Himawari Imager (AHI) with those from high spatial-resolution (375 m) VIIRS. This new dataset was named the Fused AHI-VIIRS based fire Emissions (FAVE). Fire emissions data are provided in comma-separated values (CSV) format with one file per month from July 2015 to December 2020. Each file includes variables of fire observation time, fire geographic location, classification, fire radiative energy, various fire emissions and related standard deviations.
This dataset was created by Adrian Chan