100+ datasets found

Heart Attack Risk Dataset of China
kaggle.com
zip
Updated Mar 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankush Panday (2025). Heart Attack Risk Dataset of China [Dataset]. https://www.kaggle.com/datasets/ankushpanday2/heart-attack-risk-dataset-of-china/code
Explore at:
zip(5267720 bytes)Available download formats
Dataset updated
Mar 4, 2025
Authors
Ankush Panday
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
China
Description
This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.

Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.

Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.

Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)
w
Dataset of health expenditure per capita and individuals using the Internet...
workwithdata.com
Updated Apr 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of health expenditure per capita and individuals using the Internet of countries per year in China and in 2021 (Historical) [Dataset]. https://www.workwithdata.com/datasets/countries-yearly?col=country%2Cdate%2Chealth_expenditure_capita%2Cinternet_pct&f=2&fcol0=country&fcol1=date&fop0=%3D&fop1=%3D&fval0=China&fval1=2021
Explore at:
Dataset updated
Apr 9, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This dataset is about countries per year in China. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, health expenditure per capita, and individuals using the Internet.
p
Counts of Dengue without warning signs reported in CHINA: 1979-2009
tycho.pitt.edu
data.niaid.nih.gov
Updated Apr 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Dengue without warning signs reported in CHINA: 1979-2009 [Dataset]. https://www.tycho.pitt.edu/dataset/CN.722862003
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1979 - 2009
Area covered
China
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
COVID-19 China
kaggle.com
zip
Updated Sep 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charlie Craine (2020). COVID-19 China [Dataset]. https://www.kaggle.com/crained/covid19-china
Explore at:
zip(89966331 bytes)Available download formats
Dataset updated
Sep 18, 2020
Authors
Charlie Craine
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
China
Description
The utility of this dataset has been confirmed by a senior radiologist in Tongji Hospital, Wuhan, China, who has performed diagnosis and treatment of a large number of COVID-19 patients during the outbreak of this disease between January and April. After releasing this dataset, we received several feedback expressing concerns about the usability of this dataset. The major concerns are summarized as follows. First, when the original CT images are put into papers, the quality of these images are degraded, which may render the diagnosis decisions less accurate. The quality degradation includes: the Hounsfield unit (HU) values are lost; the number of bits per pixel is reduced; the resolution of images is reduced. Second, the original CT scan contains a sequence of CT slices, but when put into papers, only a few key slices are selected, which may have negative impact on diagnosis as well.

We consulted the aforementioned radiologist at Tongji Hospital regarding these two concerns. According to the radiologist, the issues raised in these concerns do not significantly affect the accuracy of diagnosis decision-making. First, experienced radiologists are able to make accurate diagnosis from low quality CT images. For example, given a photo taken by smartphone of the original CT image, experienced radiologists can make accurate diagnosis by just looking at the photo, though the CT image in the photo has much lower quality than the original CT image. Likewise, the quality gap between CT images in papers and original CT images will not largely hurt the accuracy of diagnosis. Second, while it is preferable to read a sequence of CT slices, oftentimes a single-slice of CT contains enough clinical information for accurate decision-making.

This came from the team here: https://github.com/UCSD-AI4H/COVID-CT
z
Counts of Dengue reported in CHINA: 1979-2010
zenodo.org
tycho.pitt.edu
+1more
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Dengue reported in CHINA: 1979-2010 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/cn.38362002
Explore at:
json, zip, xmlAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/cn.38362002
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1979 - Dec 31, 2010
Area covered
China
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
m
COVID-19 Combined Data-set with Improved Measurement Errors
data.mendeley.com
Updated May 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
Explore at:
Unique identifier
https://doi.org/10.17632/nw5m4hs3jr.3
Dataset updated
May 13, 2020
Authors
Afshin Ashofteh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
n
Counts of COVID-19 reported in CHINA: 2019-2021
data.niaid.nih.gov
catalog.midasnetwork.us
+1more
csv
Updated Aug 12, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harry Hochheiser; Willem Van Panhuis; Bruce Childers; Mark Roberts; Kim Wong; J Espino; William Hogan; M Halloran; Nicholas Reich; Lauren Meyers (2022). Counts of COVID-19 reported in CHINA: 2019-2021 [Dataset]. http://doi.org/10.25337/T7/ptycho.v2.0/CN.840539006
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.25337/T7/ptycho.v2.0/CN.840539006
Dataset updated
Aug 12, 2022
Dataset provided by
MIDAS Coordination Center
Authors
Harry Hochheiser; Willem Van Panhuis; Bruce Childers; Mark Roberts; Kim Wong; J Espino; William Hogan; M Halloran; Nicholas Reich; Lauren Meyers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
CN, China
Variables measured
Case, Dead, Complete recovery, Cumulative incidence, Count of disease cases, Infectious disease incidence
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team, except for aggregation of individual case count data into daily counts when that was the best data available for a disease and location. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format. All geographic locations at the country and admin1 level have been represented at the same geographic level as in the data source, provided an ISO code or codes could be identified, unless the data source specifies that the location is listed at an inaccurate geographical level. For more information about decisions made by the curation team, recommended data processing steps, and the data sources used, please see the README that is included in the dataset download ZIP file.
S
Dataset of premature deaths avoided due to PM2.5 pollution control policies...
scidb.cn
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haimeng Liu; Jian Liu (2023). Dataset of premature deaths avoided due to PM2.5 pollution control policies in Chinese cities [Dataset]. http://doi.org/10.57760/sciencedb.07110
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.07110
Dataset updated
Jan 13, 2023
Dataset provided by
Science Data Bank
Authors
Haimeng Liu; Jian Liu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
China
Description
Air pollution is one of China's most serious environmental issues, taking a significant toll on residents' physical and mental health. Since the implementation of policies such as the Action Plan on Air Pollution Prevention and Control in 2013, air quality in most Chinese cities has improved significantly. This dataset is based on a counterfactual research paradigm to measure the actual number of premature deaths due to PM2.5 pollution in 2019 and the number of premature deaths due to PM2.5 pollution in 2019 under a scenario with no policy in place. Then subtract the former from the latter to get the dataset of premature deaths avoided due to PM2.5 pollution control policies in Chinese cities in 2019. The dataset includes: (1) The actual number of premature deaths due to PM2.5 pollution in 2019; (2) The number of premature deaths in 2019 under the no-policy scenario; (3) The number of premature deaths reduced in 2019 as a result of environmental policies. The dataset covers 343 cities and archived in .shp and .xls formats with 30.4 MB. This dataset could support some research on air pollution control and urban environmental health in China, and can also provide references for the assessment of local government's environmental performance.
Chinese Food Market Insights
kaggle.com
zip
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniil Krasnoproshin (2024). Chinese Food Market Insights [Dataset]. https://www.kaggle.com/datasets/daniilkrasnoproshin/chinese-food-market-insights
Explore at:
zip(11359 bytes)Available download formats
Dataset updated
May 18, 2024
Authors
Daniil Krasnoproshin
Description
Delve into the dynamics of food prices in China with this dataset sourced from the World Food Programme Price Database. Covering essential food items like maize, rice, beans, fish, and sugar across various markets in China, this dataset provides a valuable resource for understanding food price trends over time. Whether you're an economist, policymaker, or researcher, explore how factors such as supply, demand, and market dynamics influence food pricing in one of the world's largest economies. With data updated weekly and spanning back to 1992, this dataset offers rich insights into the evolving landscape of food prices in China.

Headers description:

date: The date of data collection or reporting.

admin1: Refers to the primary administrative division within the country, such as provinces or states.

admin2: Further subdivision within the primary administrative division, such as districts or counties.

market: Specifies the market or location where the food prices were recorded.

latitude: The geographic latitude coordinates of the market location.

longitude: The geographic longitude coordinates of the market location.

category: Describes the broad category or type of food commodity.

commodity: Specifies the specific food item or product within the category.

unit: Indicates the unit of measurement for the price (e.g., kilograms, pounds).

priceflag: Flags indicating any special conditions or notes related to the price.

pricetype: Specifies the type of price recorded (e.g., retail price, wholesale price).

currency: Denotes the currency in which the price is expressed.

price: The recorded price of the commodity in the local currency.

usdprice: The equivalent price of the commodity converted to US dollars for standardized comparison.

Source: https://data.humdata.org/dataset/wfp-food-prices-for-china
E
Dataset for a hybrid model approach for estimating health burden from NO2 in...
dtechtive.com
find.data.gov.scot
csv, tif, txt
Updated Sep 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Edinburgh (2019). Dataset for a hybrid model approach for estimating health burden from NO2 in megacities in China: a case study in Guangzhou [Dataset]. http://doi.org/10.7488/ds/2624
Explore at:
txt(0.0008 MB), csv(0.0001 MB), txt(0.0166 MB), tif(105 MB), csv(0.0043 MB)Available download formats
Unique identifier
https://doi.org/10.7488/ds/2624
Dataset updated
Sep 30, 2019
Dataset provided by
University of Edinburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Guangzhou, China
Description
These files contain summary statistics for meta-analysis GWAS of six traits in European, African and Trans-Ancestry. There are 18 files in total - 6 for each ancestry by 3 for each trait. The traits are DNA methylation proxies for granulocyte proportions (gran) and plasminogen activator inhibitor-1 (PAI1), and four epigenetic age acceleration measures of: PhenoAge, GrimAge, HannumAge, and Intrinsic Epigenetic Age Acceleration (IEAA).
S
Supporting dataset of the aritcle :Underneath Social Media Texts: Sentiment...
scidb.cn
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bingyao Jia; Meifang Xie; Jing Wu; Junyi Zhao (2024). Supporting dataset of the aritcle :Underneath Social Media Texts: Sentiment Responses to Public Health Emergency During 2022 COVID-19 Pandemic in China [Dataset]. http://doi.org/10.57760/sciencedb.16527
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.16527
Dataset updated
Mar 4, 2024
Dataset provided by
Science Data Bank
Authors
Bingyao Jia; Meifang Xie; Jing Wu; Junyi Zhao
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Area covered
China
Description
This dataset is the supporting data for the paper Underneath Social Media Texts: Sentiment Responses to Public Health Emergency During 2022 COVID-19 Pandemic in China.This dataset is mainly used to analyze the data of weibo text and perform sentiment analysis. The data were obtained from Weibo, and the texts were crawled using a Python tool: Weibo crawler tool. The data contains time, text content, user address, etc. Subsequently, Cleaned weibo data was obtained after cleaning operation in Excel. According to the improved Chinese sentiment lexicon, the sentiment analysis tool was used to analyze the text for sentiment analysis, to derive the main sentiment and sentiment scores, and the result file is Sentiment analysis results. Finally, ADF and KPSS analysis tools were used to analyze the stability of sentiment scores in different cities.The weibo text and sentiment analysis results data in the dataset are in .xlsx format, and the rest of the tools are Python code.Crawled data is limited by time, specific search terms and other restrictions, different operation time and terms may lead to differences in the data.
F
Mandarin Call Center Data for Healthcare AI
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Mandarin Call Center Data for Healthcare AI [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/healthcare-call-center-conversation-mandarin-china
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
This Mandarin Chinese Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Mandarin speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
Speech Data
The dataset features 30 Hours of dual-channel call center conversations between native Mandarin Chinese speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
•Participant Diversity:
•
Speakers: 60 verified native Mandarin Chinese speakers from our contributor community.

•
Regions: Diverse provinces across China to ensure broad dialectal representation.

•
Participant Profile: Age range of 18–70 with a gender mix of 60% male and 40% female.

•RecordingDetails:
•
Conversation Nature: Naturally flowing, unscripted conversations.

•
Call Duration: Each session ranges between 5 to 15 minutes.

•
Audio Format: WAV format, stereo, 16-bit depth at 8kHz and 16kHz sample rates.

•
Recording Environment: Captured in clear conditions without background noise or echo.

Topic Diversity
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
•Inbound Calls:
•Appointment Scheduling
•New Patient Registration
•Surgical Consultation
•Dietary Advice and Consultations
•Insurance Coverage Inquiries
•Follow-up Treatment Requests, and more
•OutboundCalls:
•Appointment Reminders
•Preventive Care Campaigns
•Test Results & Lab Reports
•Health Risk Assessment Calls
•Vaccination Updates
•Wellness Subscription Outreach, and more
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Transcription
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
•Transcription Includes:
•Speaker-identified Dialogues
•Time-coded Segments
•Non-speech Annotations (e.g., silence, cough)
•High transcription accuracy with word error rate is below 5%, backed by dual-layer QA checks.
Metadata
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
•
Participant Metadata: ID, gender, age, region, accent, and dialect.

•
Conversation Metadata: Topic, sentiment, call type, sample rate, and technical specs.

Usage and Applications
This dataset can be used across a range of healthcare and voice AI use cases:
•
Table_1_Health Care Utilization and Costs of Patients With Prostate Cancer...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lin Bai; Haishaerjiang Wushouer; Cong Huang; Zhenhuan Luo; Xiaodong Guan; Luwen Shi (2023). Table_1_Health Care Utilization and Costs of Patients With Prostate Cancer in China Based on National Health Insurance Database From 2015 to 2017.docx [Dataset]. http://doi.org/10.3389/fphar.2020.00719.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fphar.2020.00719.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Lin Bai; Haishaerjiang Wushouer; Cong Huang; Zhenhuan Luo; Xiaodong Guan; Luwen Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
BackgroundIn terms of medical costs, prostate cancer is on the increase as one of the most costly cancers, posing a tremendous economic burden, but evidence on the health care utilization and medical expenditure of prostate cancer has been absent in China.ObjectiveThis study aimed to analyze health care utilization and direct medical costs of patients with prostate cancer in China.MethodsHealth care service data with a national representative sample of basic medical insurance beneficiaries between 2015 and 2017 were obtained from the China Health Insurance Association database. We conducted descriptive and statistical analyses of health care utilization, annual direct medical costs, and composition based on cancer-related medical records. Health care utilization was measured by the number of hospital visits and the length of stay.ResultsA total of 3,936 patients with prostate cancer and 24,686 cancer-related visits between 2015 and 2017 were identified in the database. The number of annual outpatient and inpatient visits per patient differed significantly from 2015 to 2017. There was no obvious change in length of stay and annual direct medical costs from 2015 to 2017. The number of annual visits per patient (outpatient: 3.0 vs. 4.0, P < 0.01; inpatient: 1.5 vs. 2.0, P < 0.001) and the annual medical direct costs per patient (US$2,300.1 vs. US$3,543.3, P < 0.001) of patients covered by the Urban Rural Resident Basic Medical Insurance (URRBMI) were both lower than those of patients covered by the Urban Employee Basic Medical Insurance (UEBMI), and the median out-of-pocket expense of URRBMI was higher than that of UEBMI (US$926.6 vs. US$594.0, P < 0.001). The annual direct medical costs of patients with prostate cancer in Western regions were significantly lower than those of patients in Eastern and Central regions (East: US$4011.9; Central: US$3458.6; West: US$2115.5) (P < 0.001).ConclusionsThere was an imbalanced distribution of health care utilization among regions in China. The direct medical costs of Chinese patients with prostate cancer remained stable, but the gap in health care utilization and medical costs between two different insurance schemes and among regions still needed to be further addressed.
2000–2020 Monthly Air Quality Index (AQI) Dataset of China
figshare.com
bin
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chaohao Ling; 浩吴 (2025). 2000–2020 Monthly Air Quality Index (AQI) Dataset of China [Dataset]. http://doi.org/10.6084/m9.figshare.29975356.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29975356.v3
Dataset updated
Nov 11, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chaohao Ling; 浩吴
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
This dataset provides monthly gridded Air Quality Index (AQI) data covering the entire territory of China from 2000 to 2020, with a spatial resolution of 1 km. The data were generated to support research on the associations between long-term/seasonal air pollution exposure and cardiovascular disease (CVD) risk in Chinese older adults (aged ≥65 years), as part of a study using the China Health and Retirement Longitudinal Study (CHARLS, 2011–2020) cohort. It captures fine-scale spatial and temporal variations in air quality across China, enabling precise linking of environmental exposure to individual health outcomes. China’s national standard (GB 3095–2018) as the maximum index among six criteria pollutants (PM₂.₅, PM₁₀, SO₂, CO, NO₂, O₃). Eighteen predictors were integrated to ensure accuracy, including meteorological variables (e.g., 2-m air temperature, 10-m wind speed from the China Meteorological Forcing Dataset), vegetation metrics (Normalized Difference Vegetation Index [NDVI], Net Primary Productivity [NPP]), anthropogenic factors (downscaled GDP, population density, Human Footprint Index), and soil properties (pH, soil organic carbon from China’s High-Resolution National Soil Information Grid). Four tree-based ensemble algorithms (Random Forest [RF], Gradient Boosting Machine [GBM], CatBoost, XGBoost) were compared, with the RF model selected as optimal (test set: R² = 0.83, Root Mean Square Error [RMSE] = 10.25, Mean Absolute Error [MAE] = 9.03) after validation via 10-fold geographic stratified cross-validation and 100 bootstrap iterations; Recursive Feature Elimination (RFE) further refined 14 core predictors to minimize overfitting. The dataset is provided as NCnet files (252 total, one per month) covering China (80°E–135°E, 15°N–53°N).
m
China Reform Health Management and Services Group Co Ltd - Ebitda
macro-rankings.com
csv, excel
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
macro-rankings (2025). China Reform Health Management and Services Group Co Ltd - Ebitda [Dataset]. https://www.macro-rankings.com/markets/stocks/000503-she/income-statement/ebitda
Explore at:
excel, csvAvailable download formats
Dataset updated
Oct 15, 2025
Dataset authored and provided by
macro-rankings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
Ebitda Time Series for China Reform Health Management and Services Group Co Ltd. China Reform Health Management and Services Group Co., Ltd. offers medical insurance management services in China. The company is involved in business that covers 177 medical insurance in 25 provinces. It is also involved in the pharmaceutical and medical business. The company was formerly known as SeaRainbow Holding Corp. and changed its name to China Reform Health Management and Services Group Co., Ltd. in May 2018. China Reform Health Management and Services Group Co., Ltd. was founded in 1987 and is based in Beijing, China.
d
Replication Data for: Does Housing Prices really Reduce Physical health?:...
dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiong, Feng (2023). Replication Data for: Does Housing Prices really Reduce Physical health?: Empirical Evidence from Chinese General Social Survey [Dataset]. http://doi.org/10.7910/DVN/ZI8FV1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ZI8FV1
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Xiong, Feng
Description
The empirical datasets in this paper were obtained from two databases, Chinese General Social Survey (CGSS) and the China premium database of CEIC. The datasets of CGSS are initiated by the National Survey Research Center of Renmin University of China, and has been implemented every one to two years since 2003, with the most recent year being 2015. The empirical study in this paper will select survey data for three years, 2012, 2013, and 2015, which capture the period of rapid house price increase in China. Meanwhile, the datasets of CGSS are high-quality cross-sectional data, which not only contain rich information on demographics, income (individual and household), housing and marriage perceptions, but also cover rich information on individual health status, such as self-rated physical health, height and weight (used to calculate BMI), which is also of interest in our paper. In addition, it includes subjective social status, mental health status, and health-related behaviors for the mechanistic analysis in this paper.
d
Data from: Travel burden increases the risk of advanced stage at diagnosis...
search.dataone.org
dataverse.harvard.edu
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xuwei Tian (2025). Travel burden increases the risk of advanced stage at diagnosis of Breast Cancer in Kashgar, China [Dataset]. http://doi.org/10.7910/DVN/XI5GHT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/XI5GHT
Dataset updated
Oct 29, 2025
Dataset provided by
Harvard Dataverse
Authors
Xuwei Tian
Area covered
Kashgar, China
Description
This file provides a minimal, anonymized dataset for the replication of the primary statistical analyses in the manuscript titled, “Travel burden increases the risk of advanced stage at diagnosis of Breast Cancer in Kashgar, China.” The data were sourced from a retrospective study cohort at the Breast Cancer Center at the First People's Hospital of Kashgar (FPHK), Xinjiang, China. To protect patient confidentiality, this dataset has been fully anonymized. All direct identifiers have been removed. Each row in this dataset represents a single, anonymized patient.
S
A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17...
scidb.cn
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu Kang; Cao Pengfei; Zhang Chenxiang (2025). A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17 Assessments, Parallel Data of Depression Consultation and Hamilton Depression Rating Scale (PDCH) [Dataset]. http://doi.org/10.57760/sciencedb.27818
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.27818
Dataset updated
Jul 18, 2025
Dataset provided by
Science Data Bank
Authors
Liu Kang; Cao Pengfei; Zhang Chenxiang
Description
The global surge in depression rates, notably severe in China with over 95 million affected, underscores a dire public health issue. This is exacerbated by a critical shortfall in mental health professionals, highlighting an urgent call for innovative approaches. The advancement of Artificial Intelligence (AI), particularly Large Language Models, offers a promising solution by improving mental health diagnostics. However, there is a lack of real data for reliable training and accurate evaluation of AI models. To this end, this paper presents a high-quality multimodal depression consultation dataset, namely Parallel Data of Depression Consultation and Hamilton Depression Rating Scale (PDCH). The dataset is constructed based on clinical consultations from Beijing Anding Hospital, which provides audio recording and transcribed text, as well as corresponding HAMD-17 scales annotated by professionals. The dataset contains 100 consultations and the audio exceeds 2,937 minutes.Each of them is about 30-min long with more than 150 dialogue turns. It enables to fill the gap in mental health services and benefit the creation of more accurate AI models.
Data_Sheet_1_I Know Some People: The Association of Social Capital With...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weiwei Zhang; Yuankai Huang; Mengqing Lu; Guohua Lin; Tian Wo; Xiaoyu Xi (2023). Data_Sheet_1_I Know Some People: The Association of Social Capital With Primary Health Care Utilization of Residents in China.docx [Dataset]. http://doi.org/10.3389/fpubh.2021.689765.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2021.689765.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Weiwei Zhang; Yuankai Huang; Mengqing Lu; Guohua Lin; Tian Wo; Xiaoyu Xi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: Primary health care (PHC) services are underused due to the unbalanced distribution of medical resources. This is especially true in developing countries where the construction of PHC systems has begun to take effect. Social capital is one of the important factors affecting primary health care utilization.Method: This study investigated the utilization of PHC services by Chinese community residents in the past year. Social capital, PHC utilization, age, health care insurance, etc., were measured. A multilevel negative binomial model was adopted to analyze the association of social capital with PHC utilization.Results: Data of 5,471 residents from 283 communities in China were collected through a questionnaire survey in 2018. The results showed that community social capital (CSC) is significantly associated with PHC utilization in China, but individual social capital (ISC) had no significant association with PHC utilization. A one-standard deviation increase in the CSC leads to a 1.9% increase in PHC utilization. Other factors like gender, education, income, health insurance, health status, etc., are significantly associated with PHC utilization in China.Conclusions: Community social capital plays a more important role in promoting PHC utilization, while ISC plays an unclear role in PHC utilization by the residents of China.
Table_1_Mental Health Help-Seeking and Associated Factors Among Public...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui She; Xiaohui Wang; Zhoubin Zhang; Jinghua Li; Jingdong Xu; Hua You; Yan Li; Yuan Liang; Shan Li; Lina Ma; Xinran Wang; Xiuyuan Chen; Peien Zhou; Joseph Lau; Yuantao Hao; Huan Zhou; Jing Gu (2023). Table_1_Mental Health Help-Seeking and Associated Factors Among Public Health Workers During the COVID-19 Outbreak in China.DOCX [Dataset]. http://doi.org/10.3389/fpubh.2021.622677.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2021.622677.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Rui She; Xiaohui Wang; Zhoubin Zhang; Jinghua Li; Jingdong Xu; Hua You; Yan Li; Yuan Liang; Shan Li; Lina Ma; Xinran Wang; Xiuyuan Chen; Peien Zhou; Joseph Lau; Yuantao Hao; Huan Zhou; Jing Gu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background: The COVID-19 outbreak in China has created multiple stressors that threaten individuals' mental health, especially among public health workers (PHW) who are devoted to COVID-19 control and prevention work. This study aimed to investigate the prevalence of mental help-seeking and associated factors among PHW using Andersen's Behavioral Model of Health Services Use (BMHSU).Methods: A cross-sectional survey was conducted among 9,475 PHW in five provinces across China between February 18 and March 1, 2020. The subsample data of those who reported probable mental health problems were analyzed for this report (n = 3,417). Logistic and hierarchical regression analyses were conducted to examine the associations of predisposing, enabling, need, and COVID-19 contextual factors with mental health help-seeking.Results: Only 12.7% of PHW reported professional mental help-seeking during the COVID-19 outbreak. PHW who were older, had more days of overnight work, received psychological training, perceived a higher level of support from the society, had depression and anxiety were more likely to report mental help-seeking (ORm range: 1.02–1.73, all p < 0.05) while those worked in Centers for Disease Control and Prevention were less likely to seek help (ORm = 0.57, p < 0.01). The belief that mental health issues were not the priority (64.4%), lack of time (56.4%), and shortage of psychologists (32.7%) were the most frequently endorsed reasons for not seeking help.Conclusions: The application of BMHSU confirmed associations between some factors and PHW's mental health help-seeking. Effective interventions are warranted to promote mental health help-seeking of PHW to ameliorate the negative impact of mental illness and facilitate personal recovery and routine work.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ankush Panday (2025). Heart Attack Risk Dataset of China [Dataset]. https://www.kaggle.com/datasets/ankushpanday2/heart-attack-risk-dataset-of-china/code

Heart Attack Risk Dataset of China

Comprehensive Cardiovascular Health Insights Across China

Explore at:

zip(5267720 bytes)Available download formats

Dataset updated

Mar 4, 2025

Authors

Ankush Panday

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Area covered

China

Description

This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.

Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.

Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.

Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)

Clear search

Close search

Google apps

Main menu

Heart Attack Risk Dataset of China

Dataset of health expenditure per capita and individuals using the Internet...

Counts of Dengue without warning signs reported in CHINA: 1979-2009

COVID-19 China

Counts of Dengue reported in CHINA: 1979-2010

COVID-19 Combined Data-set with Improved Measurement Errors

Counts of COVID-19 reported in CHINA: 2019-2021

Dataset of premature deaths avoided due to PM2.5 pollution control policies...

Chinese Food Market Insights

Dataset for a hybrid model approach for estimating health burden from NO2 in...

Supporting dataset of the aritcle :Underneath Social Media Texts: Sentiment...

Mandarin Call Center Data for Healthcare AI

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Table_1_Health Care Utilization and Costs of Patients With Prostate Cancer...

2000–2020 Monthly Air Quality Index (AQI) Dataset of China

China Reform Health Management and Services Group Co Ltd - Ebitda

Replication Data for: Does Housing Prices really Reduce Physical health?:...

Data from: Travel burden increases the risk of advanced stage at diagnosis...

A Multimodal Depression Consultation Dataset of Speech and Text with HAMD-17...

Data_Sheet_1_I Know Some People: The Association of Social Capital With...

Table_1_Mental Health Help-Seeking and Associated Factors Among Public...

Heart Attack Risk Dataset of China

Comprehensive Cardiovascular Health Insights Across China