100+ datasets found
  1. Electronic Health Legal Data

    • kaggle.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Electronic Health Legal Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/electronic-health-legal-data
    Explore at:
    zip(192951 bytes)Available download formats
    Dataset updated
    Jan 29, 2023
    Authors
    The Devastator
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Electronic Health Legal Data

    Exploring Laws and Regulations

    By US Open Data Portal, data.gov [source]

    About this dataset

    This Electronic Health Information Legal Epidemiology dataset offers an extensive collection of legal and epidemiological data that can be used to understand the complexities of electronic health information. It contains a detailed balance of variables, including legal requirements, enforcement mechanisms, proprietary tools, access restrictions, privacy and security implications, data rights and responsibilities, user accounts and authentication systems. This powerful set provides researchers with real-world insights into the functioning of EHI law in order to assess its impact on patient safety and public health outcomes. With such data it is possible to gain a better understanding of current policies regarding the regulation of electronic health information as well as their potential for improvement in safeguarding patient confidentiality. Use this dataset to explore how these laws impact our healthcare system by exploring patterns across different groups over time or analyze changes leading up to new versions or updates. Make exciting discoveries with this comprehensive dataset!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Start by familiarizing yourself with the different columns of the dataset. Examine each column closely and look up any unfamiliar terminology to get a better understanding of what the columns are referencing.

    • Once you understand the data and what it is intended to represent, think about how you might want to use it in your analysis. You may want to create a research question, or narrower focus for your project surrounding legal epidemiology of electronic health information that can be answered with this data set.

    • After creating your research plan, begin manipulating and cleaning up the data as needed in order to prepare it for analysis or visualization as specified in your project plan or research question/model design steps you have outlined .

    4 .Next, perform exploratory data analysis (EDA) on relevant subsets of data from specific countries if needed on specific subsets based on targets of interests (e.g gender). Filter out irrelevant information necessary for drawing meaningful insights; analyze patterns and trends observed in your filtered datasets ; compare areas which have differing rates e-health related rules and regulations tying decisions made by elected officials strongly driven by demographics , socioeconomics factors ,ideology etc.. . Look out for correlations using statistical information as needed throughout all stages in process from filtering out dis-informative subgroups from full population set til generating visualizations(graphs/ diagrams) depicting valid insight leveraging descriptive / predictive models properly validate against reference datasets when available always keep openness principal during gathering info especially when needs requires contact external sources such validating multiple sources work best provide strong seals establishing validity accuracy facts statement representing humans case scenarios digital support suitably localized supporting local languages culture respectively while keeping secure datasets private visible limited particular users duly authorized access 5 Finally create concrete summaries reporting discoveries create share findings preferably infographics showcasing evidence observances providing overall assessment main conclusions protocols developed so far broader community indirectly related interested professionals able benefit those results ideas complete transparently freely adapted locally ported increase overall global society level enhancing potentiality range impact derive conditions allowing wider adoption increased usage diffusion capture wide spread change movement affect global e-health legal domain clear manner

    Research Ideas

    • Studying how technology affects public health policies and practice - Using the data, researchers can look at the various types of legal regulations related to electronic health information to examine any relations between technology and public health decisions in certain areas or regions.
    • Evaluating trends in legal epidemiology – With this data, policymakers can identify patterns that help measure the evolution of electronic health information regulations over time and investigate why such rules are changing within different states or countries.
    • Analysing possible impacts on healthcare costs – Looking at changes in laws, regulations, and standards relate...
  2. Data from: Weather conditions and Legionellosis: A nationwide case-crossover...

    • catalog.data.gov
    Updated Mar 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Weather conditions and Legionellosis: A nationwide case-crossover study among Medicare recipients [Dataset]. https://catalog.data.gov/dataset/weather-conditions-and-legionellosis-a-nationwide-case-crossover-study-among-medicare-reci
    Explore at:
    Dataset updated
    Mar 29, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data consist of CMS Medicare data files which are restricted access and cannot be released publicly. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. EPA cannot release CBI, or data protected by copyright, patent, or otherwise subject to trade secret restrictions. Request for access to CBI data may be directed to the dataset owner by an authorized person by contacting the party listed. It can be accessed through the following means: CMS Medicare data are available from: https://www.cms.gov/data-research/files-for-order/data-disclosures-and-data-use-agreements-duas/limited-data-set-lds with the requirement of a signed Data Use Agreement. . Weather data are available at https://prism.oregonstate.edu/. Format: The data that support the findings of this study are available from the Centers for Medicare and Medicaid Services (CMS). Restrictions apply to the availability of these data, which were provided under a Data Use Agreement specific to this study. Data are available from: https://www.cms.gov/data-research/files-for-order/data-disclosures-and-data-use-agreements-duas/limited-data-set-lds with the requirement of a signed Data Use Agreement. Data do not contain personally identifiable information but contain are classified as Limited Data Set files and their distribution require an agreement and between CMS and the requester and approval by CMS. Weather data are available at https://prism.oregonstate.edu/. Because the data do not contain identifiable private information and were not obtained through interaction or intervention with individuals, the Institutional Review Board for the University of North Carolina and the US Environmental Protection Agency Human Research Protocol Officer determined that use of this data does not constitute human subjects research. This dataset is associated with the following publication: Wade, T., and C. Herbert. Weather conditions and legionellosis: a nationwide case-crossover study among Medicare recipients. EPIDEMIOLOGY AND INFECTION. Cambridge University Press, Cambridge, UK, 152: E125, (2024).

  3. d

    Limited Access Datasets From NIMH Clinical Trials

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Limited Access Datasets From NIMH Clinical Trials [Dataset]. http://identifiers.org/RRID:SCR_005614
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A listing of data sets from NIMH-supported clinical trials. Limited Access Datasets are available from numerous NIMH studies. NIMH requires all investigators seeking access to data from NIMH-supported trials held by NIMH to execute and submit as their request the appropriate Data Use Certification pertaining to the trial. The datasets distributed by NIMH are referred to as limited access datasets because access is limited to qualified researchers who complete Data Use Certifications.

  4. o

    Public Health Portfolio (Directly Funded Research - Programmes and Training...

    • nihr.opendatasoft.com
    • nihr.aws-ec2-eu-central-1.opendatasoft.com
    csv, excel, json
    Updated Nov 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Public Health Portfolio (Directly Funded Research - Programmes and Training Awards) [Dataset]. https://nihr.opendatasoft.com/explore/dataset/phof-datase/
    Explore at:
    excel, json, csvAvailable download formats
    Dataset updated
    Nov 4, 2025
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Public Health Portfolio (Directly Funded Research - Programme and Training Awards) dataset contains NIHR directly funded research awards where the funding is allocated to an award holder or host organisation to carry out a specific piece of research or complete a training award. The NIHR also invests significantly in centres of excellence, collaborations, services and facilities to support research in England. Collectively these form NIHR infrastructure support. NIHR infrastructure supported projects are available in the Public Health Portfolio (Infrastructure Support) dataset which you can find here.NIHR directly funded research awards (Programmes and Training Awards) that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion CriteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research. NIHR directly funded research awards are categorised as public health if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health research across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR directly funded research awards categorised as public health awards. This dataset does not include public health awards or projects funded by any of the three NIHR Research Schools or NIHR Health Protection Research Units.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further InformationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows:The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Directly Funded Research (Programmes and Training Awards), and NIHR Infrastructure Support. NIHR School for Public Health here.NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here.NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here.NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.

  5. Statutory Infrastructure Provider (SIP) - NBN Co Limited - Dataset

    • researchdata.edu.au
    • data.gov.au
    Updated Jul 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIP Register - NBN Co Limited (2020). Statutory Infrastructure Provider (SIP) - NBN Co Limited - Dataset [Dataset]. https://researchdata.edu.au/statutory-infrastructure-provider-limited-dataset/2981854
    Explore at:
    Dataset updated
    Jul 14, 2020
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    SIP Register - NBN Co Limited
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    This data set describes the service areas where NBN Co Limited is the Statutory Infrastructure Provider (SIP).\r \r This data set forms part of the SIP register which is managed by the ACMA. The SIP register is located on the ACMA’s website at https://www.acma.gov.au/sip-register.\r \r The data represented here is provided by NBN Co to the ACMA as required under Part 19 of the Telecommunications Act 1997. The ACMA also publishes NBN Co’s geospatial data to the National Map. The copyright in the data is owned by NBN Co, and users must comply with the terms of use for the data as set out on this website. The ACMA does not guarantee, and accepts no legal liability for any loss whatsoever arising from or in connection with the accuracy, reliability, currency, completeness or fitness for purpose of the data. \r \r The technology planned or delivered for premises or areas by NBN Co, and the availability of the NBN Co network at a premise, may be subject to change over time. More up to date information may be available on https://www.nbnco.com.au/.

  6. w

    MEDPAR Limited Data Set (LDS) - Hospital (National)

    • data.wu.ac.at
    Updated Apr 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Health & Human Services (2016). MEDPAR Limited Data Set (LDS) - Hospital (National) [Dataset]. https://data.wu.ac.at/schema/data_gov/NjRmOWQxNDItYjk4NS00MDI4LThkMTgtM2I1OTc3NmY2MTli
    Explore at:
    Dataset updated
    Apr 5, 2016
    Dataset provided by
    U.S. Department of Health & Human Services
    Description

    No description provided

  7. Federally Qualified Health Centers in the US

    • kaggle.com
    zip
    Updated Jan 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Federally Qualified Health Centers in the US [Dataset]. https://www.kaggle.com/datasets/thedevastator/fqhc-location-data
    Explore at:
    zip(2708847 bytes)Available download formats
    Dataset updated
    Jan 22, 2023
    Authors
    The Devastator
    Area covered
    United States
    Description

    FQHC Location Data

    Detailed Address Data on Federally Qualified Health Centers in the US

    By US Department of Health and Human Services [source]

    About this dataset

    This dataset provides comprehensive address-level information on Federally Qualified Health Centers (FQHCs) in the United States. FQHCs are community-driven and consumer run organizations that serve populations with limited access to health care, including those who are low-income, uninsured, have a limited grasp of English, migrating and seasonal farm workers, individuals experiencing homelessness, and those living in public housing. In addition to detailed location addressing data such as postal code and city name for each center in the scope of this dataset; users can find optional information about an individual center such as its operator description or the type of population it serves, along with rich backroom management data which includes grant number, grantee name and uniform resource locator (URL). Get familiarized with this essential dataset to help provide quality medical care access to under served communities across the US

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is an address-level dataset on the locations of Federally Qualified Health Centers (FQHCs). This dataset includes information on the FQHCs such as name, address, contact information, operating hours per week and grant number. It can be used to locate FQHCs in a particular area and to gain insights into the services they provide.

    In order to use this data set, it is important to understand what attributes are included. These are broken down into categories including basic site information (name, telephone number etc.), service description (what population is served etc.), region info (HHS region code etc.) and supplemental info including records for operator and grantee organization.

    Once you have identified what fields you are interested in, you can then use this data set for further analysis such as counting how many FQHCs exist within a certain area or determining which states have higher numbers of FQHCs than others. You can also filter by features such as services offered or population served to gain further insights into a particular segment of the FQHC market.

    It should also be noted that there may be discrepancies between different sources regarding different fields due to variations in data collection methods; however this dataset is sourced from reliable government datasets making it more accurate than other options. Additionally it contains multiple years of data which provides invaluable insight over time trends that would otherwise not be available through other sources

    Research Ideas

    • Monitoring health outcomes in a given region and comparing changes over time in terms of FQHC locations, services available, and populations served.
    • Analyzing the regional distribution of FQHCs and determining whether there are underserved areas based on population density and access to healthcare services.
    • Creating a geographic information system (GIS) map to visualize the FQHC locations across the United States, highlighting rural or underserved areas in need of additional support for healthcare access

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: SITE_HCC_FCT_DET.csv | Column name | Description | |:-----------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------| | Site Name | Name of the FQHC. (String) | | UDS Number | Unique identifier assigned by the US Department of Human Services for each FQHC. (Integer) | | Site Telephone Number | Telephone number of the FQHC. (String) | | Site Facsimile Telephone Number | Facsimile telephone number of the FQHC. (String) | | **Administrati...

  8. w

    Denominator File - Limited Data Set

    • data.wu.ac.at
    Updated Apr 5, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Health & Human Services (2016). Denominator File - Limited Data Set [Dataset]. https://data.wu.ac.at/odso/data_gov/MDdhNjYxOGMtZWIwYi00N2FkLWFiNTUtY2M1Yjc0YWZjNDc5
    Explore at:
    Dataset updated
    Apr 5, 2016
    Dataset provided by
    U.S. Department of Health & Human Services
    Description

    The Denominator File combines Medicare beneficiary entitlement status information from administrative enrollment records with third-party payer information and GHP enrollment information. The Denominator File contains data on all Medicare beneficiaries enrolled and or entitled in a given year. It is an abbreviated version of the Enrollment Data Base (EDB) (selected data elements). It does not contain data on all beneficiaries ever entitled to Medicare. The file contains data only for beneficiaries who were entitled during the year of the data. These data are available annually in May of the current year for the prior year.

  9. Dataset: Hindustan Aeronautics Limited (HAL.NS)...

    • kaggle.com
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitiraj Kulkarni (2024). Dataset: Hindustan Aeronautics Limited (HAL.NS)... [Dataset]. https://www.kaggle.com/datasets/nitirajkulkarni/hal-ns-stock-performance
    Explore at:
    zip(39729 bytes)Available download formats
    Dataset updated
    May 30, 2024
    Authors
    Nitiraj Kulkarni
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.

  10. National COVID Cohort Collaborative Data Enclave

    • datacatalog.med.nyu.edu
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States - National Center for Advancing Translational Sciences (NCATS) (2025). National COVID Cohort Collaborative Data Enclave [Dataset]. https://datacatalog.med.nyu.edu/dataset/10384
    Explore at:
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    National Center for Advancing Translational Scienceshttps://ncats.nih.gov/
    Authors
    United States - National Center for Advancing Translational Sciences (NCATS)
    Time period covered
    Jan 1, 2020 - Present
    Area covered
    United States
    Description

    The National Center for Advancing Translational Sciences (NCATS) has systematically compiled clinical, laboratory and diagnostic data from electronic health records to support COVID-19 research efforts via the National COVID Cohort Collaborative (N3C) Data Enclave. As of August 2, 2022, the repository contains information from over 15 million patients (including 5.8 million COVID-19 positive patients) across the United States.

    The N3C Data Enclave is organized into 3 levels of data with varying access restrictions:

    • Synthetic dataset: Contains no protected health information (PHI). This is a statistically-comparable artificial dataset derived from the original dataset.
      • Can be requested by: Researchers from US-based or foreign institutions, and citizen scientists
    • De-identified dataset: Contains no PHI. This dataset consists of real patient data with shifted dates of service and truncated ZIP codes of patients residing in areas with populations above 20,000.
      • Can be requested by: Researchers from US-based or foreign institutions
    • Limited Data Set (LDS): Contains 2 PHI elements (dates of service and patient ZIP code). This dataset consists of real patient data.
      • Can be requested by: Researchers from US-based institutions only

  11. Medicare Current Beneficiary Survey - Limited Data Set

    • data.wu.ac.at
    Updated Apr 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Health & Human Services (2016). Medicare Current Beneficiary Survey - Limited Data Set [Dataset]. https://data.wu.ac.at/schema/data_gov/ZGNlMGNiZTAtZjFlYS00NDg5LWFhZDMtMzg5NGE0OGQ5NWY4
    Explore at:
    Dataset updated
    Apr 5, 2016
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Description

    The Medicare Current Beneficiary Survey (MCBS) is a continuous, multipurpose survey of a representative national sample of the Medicare population. There are two data files from the Medicare Current Beneficiary Survey (MCBS) that are released in annual Access to Care and Cost and Use files, which can be purchased directly from CMS.

  12. Helsinki Tomography Challenge 2022 (HTC2022) open tomographic dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Meaney; Alexander Meaney; Fernando Silva de Moura; Fernando Silva de Moura; Markus Juvonen; Markus Juvonen; Samuli Siltanen; Samuli Siltanen (2023). Helsinki Tomography Challenge 2022 (HTC2022) open tomographic dataset [Dataset]. http://doi.org/10.5281/zenodo.8041800
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Meaney; Alexander Meaney; Fernando Silva de Moura; Fernando Silva de Moura; Markus Juvonen; Markus Juvonen; Samuli Siltanen; Samuli Siltanen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki
    Description

    This dataset was primarily designed for the Helsinki Tomography Challenge 2022 (HTC2022), but it can be used for generic algorithm research and development in 2D CT reconstruction.

    The dataset contains 2D tomographic measurements, i.e., sinograms and the affiliated metadata containing measurement geometry and other specifications. The sinograms have already been pre-processed with background and flat-field corrections, and compensated for a slightly misaligned center of rotation in the cone-beam computed tomography scanner. The log-transforms from intensity measurements to attenuation data have also been already computed. The data has been stored as MATLAB structs and saved in .mat file format.

    The purpose of HTC2022 was to develop algorithms for limited angle tomography. The challenge data consists of tomographic measurements of two sets of plastic phantoms with a diameter of 7 cm and with holes of differing shapes cut into them. The first set is the teaching data, containing five training phantoms. The second set consists of 21 test phantoms used in the challenge to test algorithm performance. The test phantom data was released after the competition period ended.

    The training phantoms were designed to facilitate algorithm development and benchmarking for the challenge itself. Four of the training phantoms contain holes. These are labeled ta, tb, tc, and td. A fifth training phantom is a solid disc with no holes. We encourage subsampling these datasets to create limited data sinograms and comparing the reconstruction results to the ground truth obtainable from the full-data sinograms. Note that the phantoms are not all identically centered.

    The teaching data includes the following files for each phantom:

    • The sinogram and all associated metadata (.MAT).
    • A pre-computed FBP reconstruction of the phantom (.MAT and .PNG).
    • A segmentation of the FBP reconstruction created with the procedure described below (.MAT and .PNG).

    Also included in the teaching dataset is a MATLAB example script for how to work with the CT data.

    The challenge test data is arranged into seven different difficulty levels, labeled 1-7, with each level containing three different phantoms, labeled A-C. As the difficulty level increases, the number of holes increases and their shapes become increasingly complex. Furthermore, the view angle is reduced as the difficulty level increases, starting with a 90 degree field of view at level 1, and reducing by 10 degrees at each increasing level of difficulty. The view-angles in the challenge data will not all begin from 0 degrees.

    The test data includes the following files for each phantom:

    • The full sinogram and all associated metadata (.MAT).
    • The limited angle sinogram and all associated metadata, used to test the algorithms submitted to the challenge (.MAT).
    • A pre-computed FBP reconstruction of the phantom using the full data (.MAT and .PNG).
    • A pre-computed FBP reconstruction of the phantom using the limited angle data. These are of poor quality, and serve mainly as a demonstration of how FBP fails with limited angle data (.MAT and .PNG).
    • A segmentation of the FBP reconstruction using the full data, created with the procedure described below. This was used as the ground truth reference in the challenge (.MAT and .PNG).
    • A segmentation of the FBP reconstruction using the limited angle data, created with the procedure described below. These are of poor quality, and serve mainly as a demonstration of how FBP fails with limited angle data (.MAT and .PNG).
    • A photograph of the phantom, rotated and resized to match the ground truth segmentation (.PNG).

    Also included in the test dataset is a collage in .PNG format, showing all the ground truth segmentation images and the photographs of the phantoms together.

    As the orientation of CT reconstructions can depend on the tools used, we have included the example reconstructions for each of the phantoms to demonstrate how the reconstructions obtained from the sinograms and the specified geometry should be oriented. The reconstructions have been computed using the filtered back-projection algorithm (FBP) provided by the ASTRA Toolbox.

    We have also included segmentation examples of the reconstructions to demonstrate the desired format for the final competition entries. The segmentation images for obtained by the following steps:
    1) Set all negative pixel values in the reconstruction to zero.
    2) Determine a threshold level using Otsu's method.
    3) Globally threshold the image using the threshold level.
    4) Perform a morphological closing on the image using a disc with a radius of 3 pixels.

    The competitors were not obliged to follow the above procedure, and were encouraged to explore various segmentation techniques for the limited angle reconstructions.

    For getting started with the data, we recommend the following MATLAB toolboxes:

    HelTomo - Helsinki Tomography Toolbox
    https://github.com/Diagonalizable/HelTomo/

    The ASTRA Toolbox
    https://www.astra-toolbox.com/

    Spot – A Linear-Operator Toolbox
    https://www.cs.ubc.ca/labs/scl/spot/

    Using the above toolboxes for the Challenge was by no means compulsory: the metadata for each dataset contains a full specification of the measurement geometry, and the competitors were free to use any and all computational tools they want to in computing the reconstructions and segmentations.

    All measurements were conducted at the Industrial Mathematics Computed Tomography Laboratory at the University of Helsinki.

  13. CTF4Science: Kuramoto-Sivashinsky Official DS

    • kaggle.com
    zip
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Institute in Dynamic Systems (2025). CTF4Science: Kuramoto-Sivashinsky Official DS [Dataset]. https://www.kaggle.com/datasets/dynamics-ai/ctf4science-kuramoto-sivashinsky-official-ds
    Explore at:
    zip(991463847 bytes)Available download formats
    Dataset updated
    May 14, 2025
    Dataset authored and provided by
    AI Institute in Dynamic Systems
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Kuramoto-Sivashinsky (KS) Dataset - CTF4Science

    Dataset Description

    This dataset contains numerical simulations of the Kuramoto-Sivashinsky (KS) equation, a fourth-order nonlinear partial differential equation (PDE) that exhibits spatio-temporal chaos. The KS equation is a canonical example used in scientific machine learning to benchmark data-driven algorithms for dynamical systems modeling, forecasting, and reconstruction.

    The Kuramoto-Sivashinsky Equation

    The KS equation is defined as:

    u_t + uu_x + u_xx + μu_xxxx = 0
    

    where: - u(x,t) is the solution on a spatial domain x ∈ [0, 32π] with periodic boundary conditions - μ is a parameter controlling the fourth-order diffusion term - The equation exhibits spatio-temporal chaotic behavior, making it particularly challenging for forecasting algorithms

    Dataset Purpose

    This dataset is part of the Common Task Framework (CTF) for Science, designed to provide standardized, rigorous benchmarks for evaluating machine learning algorithms on scientific problems. The CTF addresses key challenges in scientific ML including:

    • Short-term forecasting (weather forecast): Predicting near-future states with trajectory accuracy
    • Long-term forecasting (climate forecast): Capturing statistical properties of long-time dynamics
    • Noisy data reconstruction: Denoising and forecasting from corrupted measurements
    • Limited data scenarios: Learning from sparse observations
    • Parametric generalization: Interpolation and extrapolation to new parameter regimes

    Key Dataset Characteristics

    • System Type: Spatio-temporal PDE (1D spatial + time)
    • Spatial Dimension: 1024 grid points across domain [0, 32π]
    • Time Step: Δt = 0.025
    • Behavior: Chaotic spatio-temporal dynamics
    • Data Format: Available in both MATLAB (.mat) and CSV formats
    • Evaluation Metrics:
      • Short-term: Root Mean Square Error (RMSE)
      • Long-term: Power Spectral Density matching with k=20, modes=100

    Evaluation Tasks

    The dataset supports 12 evaluation metrics (E1-E12) organized into 4 main task categories:

    Test 1: Forecasting (E1, E2)

    • Input: X1train (10000 × 1024)
    • Task: Forecast future 1000 timesteps
    • Metrics:
      • E1: Short-term RMSE on first k timesteps
      • E2: Long-term spectral matching on power spectral density

    Test 2: Noisy Data (E3, E4, E5, E6)

    • Medium Noise (E3, E4): Train on X2train, reconstruct and forecast
    • High Noise (E5, E6): Train on X3train, reconstruct and forecast
    • Metrics: Reconstruction accuracy (RMSE) + Long-term forecasting (spectral)

    Test 3: Limited Data (E7, E8, E9, E10)

    • Noise-Free Limited (E7, E8): 100 snapshots in X4train
    • Noisy Limited (E9, E10): 100 snapshots in X5train
    • Metrics: Short and long-term forecasting from sparse data

    Test 4: Parametric Generalization (E11, E12)

    • Input: Three training trajectories (X6, X7, X8) at different parameter values
    • Task: Interpolate (E11) and extrapolate (E12) to new parameters
    • Burn-in: X9train and X10train provide initialization
    • Metrics: Short-term RMSE on parameter generalization

    Usage Notes

    1. Hidden Test Sets: The actual test data (X1test through X9test) are hidden and used only for evaluation on the CTF leaderboard
    2. Baseline Scores: Use constant zero prediction as the baseline reference (E_i = 0)
    3. Score Range: All scores are clipped to [-100, 100], where 100 represents perfect prediction
    4. Data Continuity: Start indices in YAML indicate temporal relationship between train/test splits
    5. Chaotic Dynamics: Long-term exact trajectory matching is impossible due to Lyapunov divergence; hence spectral metrics for climate forecasting
    6. File Formats: Choose .mat for MATLAB/Python (scipy) workflows or .csv for language-agnostic access
  14. d

    A Dataset for Machine Learning Algorithm Development

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2024). A Dataset for Machine Learning Algorithm Development [Dataset]. https://catalog.data.gov/dataset/a-dataset-for-machine-learning-algorithm-development2
    Explore at:
    Dataset updated
    May 1, 2024
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    This dataset consists of imagery, imagery footprints, associated ice seal detections and homography files associated with the KAMERA Test Flights conducted in 2019. This dataset was subset to include relevant data for detection algorithm development. This dataset is limited to data collected during flights 4, 5, 6 and 7 from our 2019 surveys.

  15. Spanish Speech Recognition Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Spanish Speech Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/spanish-speech-recognition-dataset
    Explore at:
    zip(93217 bytes)Available download formats
    Dataset updated
    Jun 25, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Spanish Speech Dataset for recognition task

    Dataset comprises 488 hours of telephone dialogues in Spanish, collected from 600 native speakers across various topics and domains. This dataset boasts an impressive 98% word accuracy rate, making it a valuable resource for advancing speech recognition technology.

    By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

    The dataset includes high-quality audio recordings with text transcriptions, making it ideal for training and evaluating speech recognition models.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

    This dataset is a valuable resource for researchers and developers working on speech recognition, language models, and speech technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  16. HCUP State Emergency Department Databases (SEDD) - Restricted Access File

    • catalog.data.gov
    • healthdata.gov
    • +3more
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality, Department of Health & Human Services (2025). HCUP State Emergency Department Databases (SEDD) - Restricted Access File [Dataset]. https://catalog.data.gov/dataset/hcup-state-emergency-department-databases-sedd-restricted-access-file
    Explore at:
    Dataset updated
    Jul 29, 2025
    Description

    The Healthcare Cost and Utilization Project (HCUP) State Emergency Department Databases (SEDD) contain the universe of emergency department visits in participating States. The data are translated into a uniform format to facilitate multi-State comparisons and analyses. The SEDD consist of data from hospital-based emergency department visits that do not result in an admission. The SEDD include all patients, regardless of the expected payer including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge’. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality (AHRQ), HCUP data inform decision making at the national, State, and community levels. The SEDD contain clinical and resource use information included in a typical discharge abstract, with safeguards to protect the privacy of individual patients, physicians, and facilities (as required by data sources). Data elements include but are not limited to: diagnoses, procedures, admission and discharge status, patient demographics (e.g., sex, age, race), total charges, length of stay, and expected payment source, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge’. In addition to the core set of uniform data elements common to all SEDD, some include State-specific data elements. The SEDD exclude data elements that could directly or indirectly identify individuals. For some States, hospital and county identifiers are included that permit linkage to the American Hospital Association Annual Survey File and the Bureau of Health Professions' Area Resource File except in States that do not allow the release of hospital identifiers. Restricted access data files are available with a data use agreement and brief online security training.

  17. Data from: A Toolbox for Surfacing Health Equity Harms and Biases in Large...

    • springernature.figshare.com
    application/csv
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal (2024). A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models [Dataset]. http://doi.org/10.6084/m9.figshare.26133973.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Stephen R. Pfohl; Heather Cole-Lewis; Rory Sayres; Darlene Neal; Mercy Asiedu; Awa Dieng; Nenad Tomasev; Qazi Mamunur Rashid; Shekoofeh Azizi; Negar Rostamzadeh; Liam G. McCoy; Leo Anthony Celi; Yun Liu; Mike Schaekermann; Alanna Walton; Alicia Parrish; Chirag Nagpal; Preeti Singh; Akeiylah Dewitt; Philip Mansfield; Sushant Prakash; Katherine Heller; Alan Karthikesalingam; Christopher Semturs; Joëlle K. Barral; Greg Corrado; Yossi Matias; Jamila Smith-Loud; Ivor B. Horn; Karan Singhal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary material and data for Pfohl and Cole-Lewis et al., "A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models" (2024).

    We include the sets of adversarial questions for each of the seven EquityMedQA datasets (OMAQ, EHAI, FBRT-Manual, FBRT-LLM, TRINDS, CC-Manual, and CC-LLM), the three other non-EquityMedQA datasets used in this work (HealthSearchQA, Mixed MMQA-OMAQ, and Omiye et al.), as well as the data generated as a part of the empirical study, including the generated model outputs (Med-PaLM 2 [1] primarily, with Med-PaLM [2] answers for pairwise analyses) and ratings from human annotators (physicians, health equity experts, and consumers). See the paper for details on all datasets.

    We include other datasets evaluated in this work: HealthSearchQA [2], Mixed MMQA-OMAQ, and Omiye et al [3].

    • Mixed MMQA-OMAQ is composed of the 140 question subset of MultiMedQA questions described in [1,2] with an additional 100 questions from OMAQ (described below). The 140 MultiMedQA questions are composed of 100 from HealthSearchQA, 20 from LiveQA [4], and 20 from MedicationQA [5]. In the data presented here, we do not reproduce the text of the questions from LiveQA and MedicationQA. For LiveQA, we instead use identifier that correspond to those presented in the original dataset. For MedicationQA, we designate "MedicationQA_N" to refer to the N-th row of MedicationQA (0-indexed).

    A limited number of data elements described in the paper are not included here. The following elements are excluded:

    1. The reference answers written by physicians to HealthSearchQA questions, introduced in [2], and the set of corresponding pairwise ratings. This accounts for 2,122 rated instances.

    2. The free-text comments written by raters during the ratings process.

    3. Demographic information associated with the consumer raters (only age group information is included).

    References

    1. Singhal, K., et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).

    2. Singhal, K., Azizi, S., Tu, T. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). https://doi.org/10.1038/s41586-023-06291-2

    3. Omiye, J.A., Lester, J.C., Spichak, S. et al. Large language models propagate race-based medicine. npj Digit. Med. 6, 195 (2023). https://doi.org/10.1038/s41746-023-00939-z

    4. Abacha, Asma Ben, et al. "Overview of the medical question answering task at TREC 2017 LiveQA." TREC. 2017.

    5. Abacha, Asma Ben, et al. "Bridging the gap between consumers’ medication questions and trusted answers." MEDINFO 2019: Health and Wellbeing e-Networks for All. IOS Press, 2019. 25-29.

    Description of files and sheets

    1. Independent Ratings [ratings_independent.csv]: Contains ratings of the presence of bias and its dimensions in Med-PaLM 2 outputs using the independent assessment rubric for each of the datasets studied. The primary response regarding the presence of bias is encoded in the column bias_presence with three possible values (No bias, Minor bias, Severe bias). Binary assessments of the dimensions of bias are encoded in separate columns (e.g., inaccuracy_for_some_axes). Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Instances were missing for five instances in MMQA-OMAQ and two instances in CC-Manual. This file contains 7,519 rated instances.

    2. Paired Ratings [ratings_pairwise.csv]: Contains comparisons of the presence or degree of bias and its dimensions in Med-PaLM and Med-PaLM 2 outputs for each of the datasets studied. Pairwise responses are encoded in terms of two binary columns corresponding to which of the answers was judged to contain a greater degree of bias (e.g., Med-PaLM-2_answer_more_bias). Dimensions of bias are encoded in the same way as for ratings_independent.csv. Instances for the Mixed MMQA-OMAQ dataset are triple-rated for each rater group; other datasets are single-rated. Four ratings were missing (one for EHAI, two for FRT-Manual, one for FBRT-LLM). This file contains 6,446 rated instances.

    3. Counterfactual Paired Ratings [ratings_counterfactual.csv]: Contains ratings under the counterfactual rubric for pairs of questions defined in the CC-Manual and CC-LLM datasets. Contains a binary assessment of the presence of bias (bias_presence), columns for each dimension of bias, and categorical columns corresponding to other elements of the rubric (ideal_answers_diff, how_answers_diff). Instances for the CC-Manual dataset are triple-rated, instances for CC-LLM are single-rated. Due to a data processing error, we removed questions that refer to `Natal'' from the analysis of the counterfactual rubric on the CC-Manual dataset. This affects three questions (corresponding to 21 pairs) derived from one seed question based on the TRINDS dataset. This file contains 1,012 rated instances.

    4. Open-ended Medical Adversarial Queries (OMAQ) [equitymedqa_omaq.csv]: Contains questions that compose the OMAQ dataset. The OMAQ dataset was first described in [1].

    5. Equity in Health AI (EHAI) [equitymedqa_ehai.csv]: Contains questions that compose the EHAI dataset.

    6. Failure-Based Red Teaming - Manual (FBRT-Manual) [equitymedqa_fbrt_manual.csv]: Contains questions that compose the FBRT-Manual dataset.

    7. Failure-Based Red Teaming - LLM (FBRT-LLM); full [equitymedqa_fbrt_llm.csv]: Contains questions that compose the extended FBRT-LLM dataset.

    8. Failure-Based Red Teaming - LLM (FBRT-LLM) [equitymedqa_fbrt_llm_661_sampled.csv]: Contains questions that compose the sampled FBRT-LLM dataset used in the empirical study.

    9. TRopical and INfectious DiseaseS (TRINDS) [equitymedqa_trinds.csv]: Contains questions that compose the TRINDS dataset.

    10. Counterfactual Context - Manual (CC-Manual) [equitymedqa_cc_manual.csv]: Contains pairs of questions that compose the CC-Manual dataset.

    11. Counterfactual Context - LLM (CC-LLM) [equitymedqa_cc_llm.csv]: Contains pairs of questions that compose the CC-LLM dataset.

    12. HealthSearchQA [other_datasets_healthsearchqa.csv]: Contains questions sampled from the HealthSearchQA dataset [1,2].

    13. Mixed MMQA-OMAQ [other_datasets_mixed_mmqa_omaq]: Contains questions that compose the Mixed MMQA-OMAQ dataset.

    14. Omiye et al. [other datasets_omiye_et_al]: Contains questions proposed in Omiye et al. [3].

    Version history

    Version 2: Updated to include ratings and generated model outputs. Dataset files were updated to include unique ids associated with each question. Version 1: Contained datasets of questions without ratings. Consistent with v1 available as a preprint on Arxiv (https://arxiv.org/abs/2403.12025)

    WARNING: These datasets contain adversarial questions designed specifically to probe biases in AI systems. They can include human-written and model-generated language and content that may be inaccurate, misleading, biased, disturbing, sensitive, or offensive.

    NOTE: the content of this research repository (i) is not intended to be a medical device; and (ii) is not intended for clinical use of any kind, including but not limited to diagnosis or prognosis.

  18. DeepFake Videos Dataset

    • kaggle.com
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). DeepFake Videos Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/deepfake-videos-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    DeepFake Videos for detection tasks

    Dataset consists of 10,000+ files featuring 7,000+ people, providing a comprehensive resource for research in deepfake detection and deepfake technology. It includes real videos of individuals with AI-generated faces overlaid, specifically designed to enhance liveness detection systems.

    By utilizing this dataset, researchers can advance their understanding of deepfake generation and improve the performance of detection methods. - Get the data

    Metadata for the dataset

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F7f47885f0afdca5c22f9f47e81307b95%2FFrame%201%20(8).png?generation=1742726304761567&alt=media" alt=""> Dataset was created by generating fake faces and overlaying them onto authentic video clips sourced from platforms such as aisaver.io, faceswapvideo.ai, and magichour.ai.Videos featuring different individuals, backgrounds, and scenarios, making it suitable for various research applications.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Researchers can leverage this dataset to enhance their understanding of deepfake detection and contribute to the development of more robust detection methods that can effectively combat the challenges posed by deepfake technology.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  19. Dataset with 30K Images in 20 Artistic Styles

    • kaggle.com
    zip
    Updated Sep 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). Dataset with 30K Images in 20 Artistic Styles [Dataset]. https://www.kaggle.com/datasets/unidpro/artistic-styles-dataset
    Explore at:
    zip(22377406 bytes)Available download formats
    Dataset updated
    Sep 24, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Professional Illustrations Dataset - 30,000 images

    The dataset comprises 30,000 high-quality artistic images spanning 20 distinct artistic styles and movements. It specifically designed for advancing research in artwork generation, style transfer, and the classification of visual arts.

    By leveraging this dataset, researchers and developers can push the boundaries of generating images, creating new artistic creations, and conducting aesthetic evaluations. - Get the data

    The dataset features illustrations across 20 distinct artistic styles, such as 3D, Cartoon, Comics, Graffiti, Character, Fantasy, Dark, Engraving, and Children's book art.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Researchers can utilize this dataset to explore advanced techniques in generating images and improve the capabilities of machines in understanding and creating visual art. The inclusion of major art movements and genres ensures robust training and evaluation methods for tasks like artistic style classification and visual feature extraction.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  20. Medicare Current Beneficiary Survey - Survey File

    • datalumos.org
    • data.virginia.gov
    • +1more
    Updated Apr 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Medicare and Medicaid Services (2025). Medicare Current Beneficiary Survey - Survey File [Dataset]. http://doi.org/10.3886/E226004V1
    Explore at:
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Centers for Medicare & Medicaid Services
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Authors
    United States Department of Health and Human Services. Centers for Medicare and Medicaid Services
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2017 - 2022
    Description

    The Medicare Current Beneficiary Survey (MCBS) - Survey File Microdata Public Use File (PUF) dataset provides information on topics such as Medicare beneficiaries' access to care, health status, other information regarding beneficiaries’ knowledge of, attitudes toward, and satisfaction with their health care, as well as demographic data and information on all types of health insurance coverage.Resources for Using and Understanding the DataThis dataset is based on information from the MCBS and administrative data. The MCBS is a continuous, multi-purpose longitudinal survey covering a representative national sample of the Medicare population, including the population of beneficiaries aged 65 and over and beneficiaries aged 64 and below with certain disabling conditions. The MCBS collects this information in three data collection periods, or rounds, per year. Disclosure protections have been applied to the file, including de-identification and other methods. As a result, the MCBS Survey File Microdata file does not require a Data Use Agreement (DUA). In contrast, the MCBS Limited Data Set (LDS) releases contain beneficiary-level protected health information (PHI) and therefore require a DUA. The MCBS - Survey File Microdata file is not intended to replace the more detailed LDS files but, rather, it makes available a general-use publicly-available alternative that provides the highest degree of protection to the Medicare beneficiaries’ PHI. The main benefits of using the MCBS - Survey File Microdata file are:Increased data access for researchers of the MCBS through a free file download that is consistent with other U.S. Department of Health and Human Services (HHS) public-use survey files.Enhanced potential for policy-relevant analyses, by attracting new researchers and policymakers. Accessing the MCBS LDS can be a significant deterrent due to the associated costs and time but the MCBS - Survey File Microdata file mitigates these barriers to encourage broader utilization. A link to the more detailed MCBS LDS files is provided in the Resources section on this page. MCBS LDS data are also presented in the MCBS Chartbook linked in the Visualization section on this page.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Electronic Health Legal Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/electronic-health-legal-data
Organization logo

Electronic Health Legal Data

Exploring Laws and Regulations

Explore at:
zip(192951 bytes)Available download formats
Dataset updated
Jan 29, 2023
Authors
The Devastator
License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

Electronic Health Legal Data

Exploring Laws and Regulations

By US Open Data Portal, data.gov [source]

About this dataset

This Electronic Health Information Legal Epidemiology dataset offers an extensive collection of legal and epidemiological data that can be used to understand the complexities of electronic health information. It contains a detailed balance of variables, including legal requirements, enforcement mechanisms, proprietary tools, access restrictions, privacy and security implications, data rights and responsibilities, user accounts and authentication systems. This powerful set provides researchers with real-world insights into the functioning of EHI law in order to assess its impact on patient safety and public health outcomes. With such data it is possible to gain a better understanding of current policies regarding the regulation of electronic health information as well as their potential for improvement in safeguarding patient confidentiality. Use this dataset to explore how these laws impact our healthcare system by exploring patterns across different groups over time or analyze changes leading up to new versions or updates. Make exciting discoveries with this comprehensive dataset!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

  • Start by familiarizing yourself with the different columns of the dataset. Examine each column closely and look up any unfamiliar terminology to get a better understanding of what the columns are referencing.

  • Once you understand the data and what it is intended to represent, think about how you might want to use it in your analysis. You may want to create a research question, or narrower focus for your project surrounding legal epidemiology of electronic health information that can be answered with this data set.

  • After creating your research plan, begin manipulating and cleaning up the data as needed in order to prepare it for analysis or visualization as specified in your project plan or research question/model design steps you have outlined .

4 .Next, perform exploratory data analysis (EDA) on relevant subsets of data from specific countries if needed on specific subsets based on targets of interests (e.g gender). Filter out irrelevant information necessary for drawing meaningful insights; analyze patterns and trends observed in your filtered datasets ; compare areas which have differing rates e-health related rules and regulations tying decisions made by elected officials strongly driven by demographics , socioeconomics factors ,ideology etc.. . Look out for correlations using statistical information as needed throughout all stages in process from filtering out dis-informative subgroups from full population set til generating visualizations(graphs/ diagrams) depicting valid insight leveraging descriptive / predictive models properly validate against reference datasets when available always keep openness principal during gathering info especially when needs requires contact external sources such validating multiple sources work best provide strong seals establishing validity accuracy facts statement representing humans case scenarios digital support suitably localized supporting local languages culture respectively while keeping secure datasets private visible limited particular users duly authorized access 5 Finally create concrete summaries reporting discoveries create share findings preferably infographics showcasing evidence observances providing overall assessment main conclusions protocols developed so far broader community indirectly related interested professionals able benefit those results ideas complete transparently freely adapted locally ported increase overall global society level enhancing potentiality range impact derive conditions allowing wider adoption increased usage diffusion capture wide spread change movement affect global e-health legal domain clear manner

Research Ideas

  • Studying how technology affects public health policies and practice - Using the data, researchers can look at the various types of legal regulations related to electronic health information to examine any relations between technology and public health decisions in certain areas or regions.
  • Evaluating trends in legal epidemiology – With this data, policymakers can identify patterns that help measure the evolution of electronic health information regulations over time and investigate why such rules are changing within different states or countries.
  • Analysing possible impacts on healthcare costs – Looking at changes in laws, regulations, and standards relate...
Search
Clear search
Close search
Google apps
Main menu