100+ datasets found
  1. Heart Attack Risk Dataset of China

    • kaggle.com
    zip
    Updated Mar 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Heart Attack Risk Dataset of China [Dataset]. https://www.kaggle.com/datasets/ankushpanday2/heart-attack-risk-dataset-of-china/code
    Explore at:
    zip(5267720 bytes)Available download formats
    Dataset updated
    Mar 4, 2025
    Authors
    Ankush Panday
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    China
    Description

    This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.

    Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.

    Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.

    Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)

  2. w

    Dataset of health expenditure per capita and individuals using the Internet...

    • workwithdata.com
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of health expenditure per capita and individuals using the Internet of countries per year in China and in 2021 (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Cdate%2Chealth_expenditure_capita%2Cinternet_pct&f=2&fcol0=country&fcol1=date&fop0=%3D&fop1=%3D&fval0=China&fval1=2021
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset is about countries per year in China. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, health expenditure per capita, and individuals using the Internet.

  3. p

    Counts of Dengue without warning signs reported in CHINA: 1979-2009

    • tycho.pitt.edu
    • data.niaid.nih.gov
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Dengue without warning signs reported in CHINA: 1979-2009 [Dataset]. https://www.tycho.pitt.edu/dataset/CN.722862003
    Explore at:
    Dataset updated
    Apr 1, 2018
    Dataset provided by
    Project Tycho, University of Pittsburgh
    Authors
    Willem G Van Panhuis; Anne L Cross; Donald S Burke
    Time period covered
    1979 - 2009
    Area covered
    China
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  4. COVID-19 China

    • kaggle.com
    zip
    Updated Sep 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie Craine (2020). COVID-19 China [Dataset]. https://www.kaggle.com/crained/covid19-china
    Explore at:
    zip(89966331 bytes)Available download formats
    Dataset updated
    Sep 18, 2020
    Authors
    Charlie Craine
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    China
    Description

    The utility of this dataset has been confirmed by a senior radiologist in Tongji Hospital, Wuhan, China, who has performed diagnosis and treatment of a large number of COVID-19 patients during the outbreak of this disease between January and April. After releasing this dataset, we received several feedback expressing concerns about the usability of this dataset. The major concerns are summarized as follows. First, when the original CT images are put into papers, the quality of these images are degraded, which may render the diagnosis decisions less accurate. The quality degradation includes: the Hounsfield unit (HU) values are lost; the number of bits per pixel is reduced; the resolution of images is reduced. Second, the original CT scan contains a sequence of CT slices, but when put into papers, only a few key slices are selected, which may have negative impact on diagnosis as well.

    We consulted the aforementioned radiologist at Tongji Hospital regarding these two concerns. According to the radiologist, the issues raised in these concerns do not significantly affect the accuracy of diagnosis decision-making. First, experienced radiologists are able to make accurate diagnosis from low quality CT images. For example, given a photo taken by smartphone of the original CT image, experienced radiologists can make accurate diagnosis by just looking at the photo, though the CT image in the photo has much lower quality than the original CT image. Likewise, the quality gap between CT images in papers and original CT images will not largely hurt the accuracy of diagnosis. Second, while it is preferable to read a sequence of CT slices, oftentimes a single-slice of CT contains enough clinical information for accurate decision-making.

    This came from the team here: https://github.com/UCSD-AI4H/COVID-CT

  5. z

    Counts of Dengue reported in CHINA: 1979-2010

    • zenodo.org
    • tycho.pitt.edu
    • +1more
    json, xml, zip
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Dengue reported in CHINA: 1979-2010 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/cn.38362002
    Explore at:
    json, zip, xmlAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1979 - Dec 31, 2010
    Area covered
    China
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  6. m

    COVID-19 Combined Data-set with Improved Measurement Errors

    • data.mendeley.com
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
    Explore at:
    Dataset updated
    May 13, 2020
    Authors
    Afshin Ashofteh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.

  7. n

    Counts of COVID-19 reported in CHINA: 2019-2021

    • data.niaid.nih.gov
    • catalog.midasnetwork.us
    • +1more
    csv
    Updated Aug 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Hochheiser; Willem Van Panhuis; Bruce Childers; Mark Roberts; Kim Wong; J Espino; William Hogan; M Halloran; Nicholas Reich; Lauren Meyers (2022). Counts of COVID-19 reported in CHINA: 2019-2021 [Dataset]. http://doi.org/10.25337/T7/ptycho.v2.0/CN.840539006
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 12, 2022
    Dataset provided by
    MIDAS Coordination Center
    Authors
    Harry Hochheiser; Willem Van Panhuis; Bruce Childers; Mark Roberts; Kim Wong; J Espino; William Hogan; M Halloran; Nicholas Reich; Lauren Meyers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    CN, China
    Variables measured
    Case, Dead, Complete recovery, Cumulative incidence, Count of disease cases, Infectious disease incidence
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team, except for aggregation of individual case count data into daily counts when that was the best data available for a disease and location. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format. All geographic locations at the country and admin1 level have been represented at the same geographic level as in the data source, provided an ISO code or codes could be identified, unless the data source specifies that the location is listed at an inaccurate geographical level. For more information about decisions made by the curation team, recommended data processing steps, and the data sources used, please see the README that is included in the dataset download ZIP file.

  8. S

    Dataset of premature deaths avoided due to PM2.5 pollution control policies...

    • scidb.cn
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haimeng Liu; Jian Liu (2023). Dataset of premature deaths avoided due to PM2.5 pollution control policies in Chinese cities [Dataset]. http://doi.org/10.57760/sciencedb.07110
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Haimeng Liu; Jian Liu
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Air pollution is one of China's most serious environmental issues, taking a significant toll on residents' physical and mental health. Since the implementation of policies such as the Action Plan on Air Pollution Prevention and Control in 2013, air quality in most Chinese cities has improved significantly. This dataset is based on a counterfactual research paradigm to measure the actual number of premature deaths due to PM2.5 pollution in 2019 and the number of premature deaths due to PM2.5 pollution in 2019 under a scenario with no policy in place. Then subtract the former from the latter to get the dataset of premature deaths avoided due to PM2.5 pollution control policies in Chinese cities in 2019. The dataset includes: (1) The actual number of premature deaths due to PM2.5 pollution in 2019; (2) The number of premature deaths in 2019 under the no-policy scenario; (3) The number of premature deaths reduced in 2019 as a result of environmental policies. The dataset covers 343 cities and archived in .shp and .xls formats with 30.4 MB. This dataset could support some research on air pollution control and urban environmental health in China, and can also provide references for the assessment of local government's environmental performance.

  9. Chinese Food Market Insights

    • kaggle.com
    zip
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniil Krasnoproshin (2024). Chinese Food Market Insights [Dataset]. https://www.kaggle.com/datasets/daniilkrasnoproshin/chinese-food-market-insights
    Explore at:
    zip(11359 bytes)Available download formats
    Dataset updated
    May 18, 2024
    Authors
    Daniil Krasnoproshin
    Description

    Delve into the dynamics of food prices in China with this dataset sourced from the World Food Programme Price Database. Covering essential food items like maize, rice, beans, fish, and sugar across various markets in China, this dataset provides a valuable resource for understanding food price trends over time. Whether you're an economist, policymaker, or researcher, explore how factors such as supply, demand, and market dynamics influence food pricing in one of the world's largest economies. With data updated weekly and spanning back to 1992, this dataset offers rich insights into the evolving landscape of food prices in China.

    Headers description:

    • date: The date of data collection or reporting.
    • admin1: Refers to the primary administrative division within the country, such as provinces or states.
    • admin2: Further subdivision within the primary administrative division, such as districts or counties.
    • market: Specifies the market or location where the food prices were recorded.
    • latitude: The geographic latitude coordinates of the market location.
    • longitude: The geographic longitude coordinates of the market location.
    • category: Describes the broad category or type of food commodity.
    • commodity: Specifies the specific food item or product within the category.
    • unit: Indicates the unit of measurement for the price (e.g., kilograms, pounds).
    • priceflag: Flags indicating any special conditions or notes related to the price.
    • pricetype: Specifies the type of price recorded (e.g., retail price, wholesale price).
    • currency: Denotes the currency in which the price is expressed.
    • price: The recorded price of the commodity in the local currency.
    • usdprice: The equivalent price of the commodity converted to US dollars for standardized comparison.

    Source: https://data.humdata.org/dataset/wfp-food-prices-for-china

  10. E

    Dataset for a hybrid model approach for estimating health burden from NO2 in...

    • dtechtive.com
    • find.data.gov.scot
    csv, tif, txt
    Updated Sep 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2019). Dataset for a hybrid model approach for estimating health burden from NO2 in megacities in China: a case study in Guangzhou [Dataset]. http://doi.org/10.7488/ds/2624
    Explore at:
    txt(0.0008 MB), csv(0.0001 MB), txt(0.0166 MB), tif(105 MB), csv(0.0043 MB)Available download formats
    Dataset updated
    Sep 30, 2019
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Guangzhou, China
    Description

    These files contain summary statistics for meta-analysis GWAS of six traits in European, African and Trans-Ancestry. There are 18 files in total - 6 for each ancestry by 3 for each trait. The traits are DNA methylation proxies for granulocyte proportions (gran) and plasminogen activator inhibitor-1 (PAI1), and four epigenetic age acceleration measures of: PhenoAge, GrimAge, HannumAge, and Intrinsic Epigenetic Age Acceleration (IEAA).

  11. S

    Supporting dataset of the aritcle :Underneath Social Media Texts: Sentiment...

    • scidb.cn
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bingyao Jia; Meifang Xie; Jing Wu; Junyi Zhao (2024). Supporting dataset of the aritcle :Underneath Social Media Texts: Sentiment Responses to Public Health Emergency During 2022 COVID-19 Pandemic in China [Dataset]. http://doi.org/10.57760/sciencedb.16527
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Bingyao Jia; Meifang Xie; Jing Wu; Junyi Zhao
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset is the supporting data for the paper Underneath Social Media Texts: Sentiment Responses to Public Health Emergency During 2022 COVID-19 Pandemic in China.This dataset is mainly used to analyze the data of weibo text and perform sentiment analysis. The data were obtained from Weibo, and the texts were crawled using a Python tool: Weibo crawler tool. The data contains time, text content, user address, etc. Subsequently, Cleaned weibo data was obtained after cleaning operation in Excel. According to the improved Chinese sentiment lexicon, the sentiment analysis tool was used to analyze the text for sentiment analysis, to derive the main sentiment and sentiment scores, and the result file is Sentiment analysis results. Finally, ADF and KPSS analysis tools were used to analyze the stability of sentiment scores in different cities.The weibo text and sentiment analysis results data in the dataset are in .xlsx format, and the rest of the tools are Python code.Crawled data is limited by time, specific search terms and other restrictions, different operation time and terms may lead to differences in the data.

  12. F

    Mandarin Call Center Data for Healthcare AI

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Mandarin Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-mandarin-china
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    This Mandarin Chinese Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Mandarin speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.

    Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.

    Speech Data

    The dataset features 30 Hours of dual-channel call center conversations between native Mandarin Chinese speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.

    Participant Diversity:
    Speakers: 60 verified native Mandarin Chinese speakers from our contributor community.
    Regions: Diverse provinces across China to ensure broad dialectal representation.
    Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.
    RecordingDetails:
    Conversation Nature: Naturally flowing, unscripted conversations.
    Call Duration: Each session ranges between 5 to 15 minutes.
    Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.
    Recording Environment: Captured in clear conditions without background noise or echo.

    Topic Diversity

    The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).

    Inbound Calls:
    Appointment Scheduling
    New Patient Registration
    Surgical Consultation
    Dietary Advice and Consultations
    Insurance Coverage Inquiries
    Follow-up Treatment Requests, and more
    OutboundCalls:
    Appointment Reminders
    Preventive Care Campaigns
    Test Results & Lab Reports
    Health Risk Assessment Calls
    Vaccination Updates
    Wellness Subscription Outreach, and more

    These real-world interactions help build speech models that understand healthcare domain nuances and user intent.

    Transcription

    Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.

    Transcription Includes:
    Speaker-identified Dialogues
    Time-coded Segments
    Non-speech Annotations (e.g., silence, cough)
    High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.

    Metadata

    Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.

    Participant Metadata: ID, gender, age, region, accent, and dialect.
    Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

    Usage and Applications

    This dataset can be used across a range of healthcare and voice AI use cases:

  13. Table_1_Health Care Utilization and Costs of Patients With Prostate Cancer...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin Bai; Haishaerjiang Wushouer; Cong Huang; Zhenhuan Luo; Xiaodong Guan; Luwen Shi (2023). Table_1_Health Care Utilization and Costs of Patients With Prostate Cancer in China Based on National Health Insurance Database From 2015 to 2017.docx [Dataset]. http://doi.org/10.3389/fphar.2020.00719.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Lin Bai; Haishaerjiang Wushouer; Cong Huang; Zhenhuan Luo; Xiaodong Guan; Luwen Shi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    BackgroundIn terms of medical costs, prostate cancer is on the increase as one of the most costly cancers, posing a tremendous economic burden, but evidence on the health care utilization and medical expenditure of prostate cancer has been absent in China.ObjectiveThis study aimed to analyze health care utilization and direct medical costs of patients with prostate cancer in China.MethodsHealth care service data with a national representative sample of basic medical insurance beneficiaries between 2015 and 2017 were obtained from the China Health Insurance Association database. We conducted descriptive and statistical analyses of health care utilization, annual direct medical costs, and composition based on cancer-related medical records. Health care utilization was measured by the number of hospital visits and the length of stay.ResultsA total of 3,936 patients with prostate cancer and 24,686 cancer-related visits between 2015 and 2017 were identified in the database. The number of annual outpatient and inpatient visits per patient differed significantly from 2015 to 2017. There was no obvious change in length of stay and annual direct medical costs from 2015 to 2017. The number of annual visits per patient (outpatient: 3.0 vs. 4.0, P < 0.01; inpatient: 1.5 vs. 2.0, P < 0.001) and the annual medical direct costs per patient (US$2,300.1 vs. US$3,543.3, P < 0.001) of patients covered by the Urban Rural Resident Basic Medical Insurance (URRBMI) were both lower than those of patients covered by the Urban Employee Basic Medical Insurance (UEBMI), and the median out-of-pocket expense of URRBMI was higher than that of UEBMI (US$926.6 vs. US$594.0, P < 0.001). The annual direct medical costs of patients with prostate cancer in Western regions were significantly lower than those of patients in Eastern and Central regions (East: US$4011.9; Central: US$3458.6; West: US$2115.5) (P < 0.001).ConclusionsThere was an imbalanced distribution of health care utilization among regions in China. The direct medical costs of Chinese patients with prostate cancer remained stable, but the gap in health care utilization and medical costs between two different insurance schemes and among regions still needed to be further addressed.

  14. 2000–2020 Monthly Air Quality Index (AQI) Dataset of China

    • figshare.com
    bin
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chaohao Ling; 浩 吴 (2025). 2000–2020 Monthly Air Quality Index (AQI) Dataset of China [Dataset]. http://doi.org/10.6084/m9.figshare.29975356.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 11, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Chaohao Ling; 浩 吴
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    This dataset provides monthly gridded Air Quality Index (AQI) data covering the entire territory of China from 2000 to 2020, with a spatial resolution of 1 km. The data were generated to support research on the associations between long-term/seasonal air pollution exposure and cardiovascular disease (CVD) risk in Chinese older adults (aged ≥65 years), as part of a study using the China Health and Retirement Longitudinal Study (CHARLS, 2011–2020) cohort. It captures fine-scale spatial and temporal variations in air quality across China, enabling precise linking of environmental exposure to individual health outcomes. China’s national standard (GB 3095–2018) as the maximum index among six criteria pollutants (PM₂.₅, PM₁₀, SO₂, CO, NO₂, O₃). Eighteen predictors were integrated to ensure accuracy, including meteorological variables (e.g., 2-m air temperature, 10-m wind speed from the China Meteorological Forcing Dataset), vegetation metrics (Normalized Difference Vegetation Index [NDVI], Net Primary Productivity [NPP]), anthropogenic factors (downscaled GDP, population density, Human Footprint Index), and soil properties (pH, soil organic carbon from China’s High-Resolution National Soil Information Grid). Four tree-based ensemble algorithms (Random Forest [RF], Gradient Boosting Machine [GBM], CatBoost, XGBoost) were compared, with the RF model selected as optimal (test set: R² = 0.83, Root Mean Square Error [RMSE] = 10.25, Mean Absolute Error [MAE] = 9.03) after validation via 10-fold geographic stratified cross-validation and 100 bootstrap iterations; Recursive Feature Elimination (RFE) further refined 14 core predictors to minimize overfitting. The dataset is provided as NCnet files (252 total, one per month) covering China (80°E–135°E, 15°N–53°N).

  15. m

    China Reform Health Management and Services Group Co Ltd - Ebitda

    • macro-rankings.com
    csv, excel
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). China Reform Health Management and Services Group Co Ltd - Ebitda [Dataset]. https://www.macro-rankings.com/markets/stocks/000503-she/income-statement/ebitda
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Oct 15, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Ebitda Time Series for China Reform Health Management and Services Group Co Ltd. China Reform Health Management and Services Group Co., Ltd. offers medical insurance management services in China. The company is involved in business that covers 177 medical insurance in 25 provinces. It is also involved in the pharmaceutical and medical business. The company was formerly known as SeaRainbow Holding Corp. and changed its name to China Reform Health Management and Services Group Co., Ltd. in May 2018. China Reform Health Management and Services Group Co., Ltd. was founded in 1987 and is based in Beijing, China.

  16. d

    Replication Data for: Does Housing Prices really Reduce Physical health?:...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiong, Feng (2023). Replication Data for: Does Housing Prices really Reduce Physical health?: Empirical Evidence from Chinese General Social Survey [Dataset]. http://doi.org/10.7910/DVN/ZI8FV1
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Xiong, Feng
    Description

    The empirical datasets in this paper were obtained from two databases, Chinese General Social Survey (CGSS) and the China premium database of CEIC. The datasets of CGSS are initiated by the National Survey Research Center of Renmin University of China, and has been implemented every one to two years since 2003, with the most recent year being 2015. The empirical study in this paper will select survey data for three years, 2012, 2013, and 2015, which capture the period of rapid house price increase in China. Meanwhile, the datasets of CGSS are high-quality cross-sectional data, which not only contain rich information on demographics, income (individual and household), housing and marriage perceptions, but also cover rich information on individual health status, such as self-rated physical health, height and weight (used to calculate BMI), which is also of interest in our paper. In addition, it includes subjective social status, mental health status, and health-related behaviors for the mechanistic analysis in this paper.

  17. d

    Data from: Travel burden increases the risk of advanced stage at diagnosis...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuwei Tian (2025). Travel burden increases the risk of advanced stage at diagnosis of Breast Cancer in Kashgar, China [Dataset]. http://doi.org/10.7910/DVN/XI5GHT
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Xuwei Tian
    Area covered
    Kashgar, China
    Description

    This file provides a minimal, anonymized dataset for the replication of the primary statistical analyses in the manuscript titled, “Travel burden increases the risk of advanced stage at diagnosis of Breast Cancer in Kashgar, China.” The data were sourced from a retrospective study cohort at the Breast Cancer Center at the First People's Hospital of Kashgar (FPHK), Xinjiang, China. To protect patient confidentiality, this dataset has been fully anonymized. All direct identifiers have been removed. Each row in this dataset represents a single, anonymized patient.

  18. S

    A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17...

    • scidb.cn
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu Kang; Cao Pengfei; Zhang Chenxiang (2025). A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments, Parallel Data of Depression Consultation and Hamilton Depression Rating Scale (PDCH) [Dataset]. http://doi.org/10.57760/sciencedb.27818
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Liu Kang; Cao Pengfei; Zhang Chenxiang
    Description

    The global surge in depression rates, notably severe in China with over 95 million affected, underscores a dire public health issue. This is exacerbated by a critical shortfall in mental health professionals, highlighting an urgent call for innovative approaches. The advancement of Artificial Intelligence (AI), particularly Large Language Models, offers a promising solution by improving mental health diagnostics. However, there is a lack of real data for reliable training and accurate evaluation of AI models. To this end, this paper presents a high-quality multimodal depression consultation dataset, namely Parallel Data of Depression Consultation and Hamilton Depression Rating Scale (PDCH). The dataset is constructed based on clinical consultations from Beijing Anding Hospital, which provides audio recording and transcribed text, as well as corresponding HAMD-17 scales annotated by professionals. The dataset contains 100 consultations and the audio exceeds 2,937 minutes.Each of them is about 30-min long with more than 150 dialogue turns. It enables to fill the gap in mental health services and benefit the creation of more accurate AI models.

  19. Data_Sheet_1_I Know Some People: The Association of Social Capital With...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weiwei Zhang; Yuankai Huang; Mengqing Lu; Guohua Lin; Tian Wo; Xiaoyu Xi (2023). Data_Sheet_1_I Know Some People: The Association of Social Capital With Primary Health Care Utilization of Residents in China.docx [Dataset]. http://doi.org/10.3389/fpubh.2021.689765.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Weiwei Zhang; Yuankai Huang; Mengqing Lu; Guohua Lin; Tian Wo; Xiaoyu Xi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: Primary health care (PHC) services are underused due to the unbalanced distribution of medical resources. This is especially true in developing countries where the construction of PHC systems has begun to take effect. Social capital is one of the important factors affecting primary health care utilization.Method: This study investigated the utilization of PHC services by Chinese community residents in the past year. Social capital, PHC utilization, age, health care insurance, etc., were measured. A multilevel negative binomial model was adopted to analyze the association of social capital with PHC utilization.Results: Data of 5,471 residents from 283 communities in China were collected through a questionnaire survey in 2018. The results showed that community social capital (CSC) is significantly associated with PHC utilization in China, but individual social capital (ISC) had no significant association with PHC utilization. A one-standard deviation increase in the CSC leads to a 1.9% increase in PHC utilization. Other factors like gender, education, income, health insurance, health status, etc., are significantly associated with PHC utilization in China.Conclusions: Community social capital plays a more important role in promoting PHC utilization, while ISC plays an unclear role in PHC utilization by the residents of China.

  20. Table_1_Mental Health Help-Seeking and Associated Factors Among Public...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui She; Xiaohui Wang; Zhoubin Zhang; Jinghua Li; Jingdong Xu; Hua You; Yan Li; Yuan Liang; Shan Li; Lina Ma; Xinran Wang; Xiuyuan Chen; Peien Zhou; Joseph Lau; Yuantao Hao; Huan Zhou; Jing Gu (2023). Table_1_Mental Health Help-Seeking and Associated Factors Among Public Health Workers During the COVID-19 Outbreak in China.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2021.622677.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Rui She; Xiaohui Wang; Zhoubin Zhang; Jinghua Li; Jingdong Xu; Hua You; Yan Li; Yuan Liang; Shan Li; Lina Ma; Xinran Wang; Xiuyuan Chen; Peien Zhou; Joseph Lau; Yuantao Hao; Huan Zhou; Jing Gu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background: The COVID-19 outbreak in China has created multiple stressors that threaten individuals' mental health, especially among public health workers (PHW) who are devoted to COVID-19 control and prevention work. This study aimed to investigate the prevalence of mental help-seeking and associated factors among PHW using Andersen's Behavioral Model of Health Services Use (BMHSU).Methods: A cross-sectional survey was conducted among 9,475 PHW in five provinces across China between February 18 and March 1, 2020. The subsample data of those who reported probable mental health problems were analyzed for this report (n = 3,417). Logistic and hierarchical regression analyses were conducted to examine the associations of predisposing, enabling, need, and COVID-19 contextual factors with mental health help-seeking.Results: Only 12.7% of PHW reported professional mental help-seeking during the COVID-19 outbreak. PHW who were older, had more days of overnight work, received psychological training, perceived a higher level of support from the society, had depression and anxiety were more likely to report mental help-seeking (ORm range: 1.02–1.73, all p < 0.05) while those worked in Centers for Disease Control and Prevention were less likely to seek help (ORm = 0.57, p < 0.01). The belief that mental health issues were not the priority (64.4%), lack of time (56.4%), and shortage of psychologists (32.7%) were the most frequently endorsed reasons for not seeking help.Conclusions: The application of BMHSU confirmed associations between some factors and PHW's mental health help-seeking. Effective interventions are warranted to promote mental health help-seeking of PHW to ameliorate the negative impact of mental illness and facilitate personal recovery and routine work.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ankush Panday (2025). Heart Attack Risk Dataset of China [Dataset]. https://www.kaggle.com/datasets/ankushpanday2/heart-attack-risk-dataset-of-china/code
Organization logo

Heart Attack Risk Dataset of China

Comprehensive Cardiovascular Health Insights Across China

Explore at:
zip(5267720 bytes)Available download formats
Dataset updated
Mar 4, 2025
Authors
Ankush Panday
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Area covered
China
Description

This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.

Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.

Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.

Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)

Search
Clear search
Close search
Google apps
Main menu