Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.
Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.
Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.
Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about countries per year in China. It has 1 row and is filtered where the date is 2021. It features 4 columns: country, health expenditure per capita, and individuals using the Internet.
Facebook
TwitterProject Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The utility of this dataset has been confirmed by a senior radiologist in Tongji Hospital, Wuhan, China, who has performed diagnosis and treatment of a large number of COVID-19 patients during the outbreak of this disease between January and April. After releasing this dataset, we received several feedback expressing concerns about the usability of this dataset. The major concerns are summarized as follows. First, when the original CT images are put into papers, the quality of these images are degraded, which may render the diagnosis decisions less accurate. The quality degradation includes: the Hounsfield unit (HU) values are lost; the number of bits per pixel is reduced; the resolution of images is reduced. Second, the original CT scan contains a sequence of CT slices, but when put into papers, only a few key slices are selected, which may have negative impact on diagnosis as well.
We consulted the aforementioned radiologist at Tongji Hospital regarding these two concerns. According to the radiologist, the issues raised in these concerns do not significantly affect the accuracy of diagnosis decision-making. First, experienced radiologists are able to make accurate diagnosis from low quality CT images. For example, given a photo taken by smartphone of the original CT image, experienced radiologists can make accurate diagnosis by just looking at the photo, though the CT image in the photo has much lower quality than the original CT image. Likewise, the quality gap between CT images in papers and original CT images will not largely hurt the accuracy of diagnosis. Second, while it is preferable to read a sequence of CT slices, oftentimes a single-slice of CT contains enough clinical information for accurate decision-making.
This came from the team here: https://github.com/UCSD-AI4H/COVID-CT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team, except for aggregation of individual case count data into daily counts when that was the best data available for a disease and location. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format. All geographic locations at the country and admin1 level have been represented at the same geographic level as in the data source, provided an ISO code or codes could be identified, unless the data source specifies that the location is listed at an inaccurate geographical level. For more information about decisions made by the curation team, recommended data processing steps, and the data sources used, please see the README that is included in the dataset download ZIP file.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Air pollution is one of China's most serious environmental issues, taking a significant toll on residents' physical and mental health. Since the implementation of policies such as the Action Plan on Air Pollution Prevention and Control in 2013, air quality in most Chinese cities has improved significantly. This dataset is based on a counterfactual research paradigm to measure the actual number of premature deaths due to PM2.5 pollution in 2019 and the number of premature deaths due to PM2.5 pollution in 2019 under a scenario with no policy in place. Then subtract the former from the latter to get the dataset of premature deaths avoided due to PM2.5 pollution control policies in Chinese cities in 2019. The dataset includes: (1) The actual number of premature deaths due to PM2.5 pollution in 2019; (2) The number of premature deaths in 2019 under the no-policy scenario; (3) The number of premature deaths reduced in 2019 as a result of environmental policies. The dataset covers 343 cities and archived in .shp and .xls formats with 30.4 MB. This dataset could support some research on air pollution control and urban environmental health in China, and can also provide references for the assessment of local government's environmental performance.
Facebook
TwitterDelve into the dynamics of food prices in China with this dataset sourced from the World Food Programme Price Database. Covering essential food items like maize, rice, beans, fish, and sugar across various markets in China, this dataset provides a valuable resource for understanding food price trends over time. Whether you're an economist, policymaker, or researcher, explore how factors such as supply, demand, and market dynamics influence food pricing in one of the world's largest economies. With data updated weekly and spanning back to 1992, this dataset offers rich insights into the evolving landscape of food prices in China.
Headers description:
Source: https://data.humdata.org/dataset/wfp-food-prices-for-china
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain summary statistics for meta-analysis GWAS of six traits in European, African and Trans-Ancestry. There are 18 files in total - 6 for each ancestry by 3 for each trait. The traits are DNA methylation proxies for granulocyte proportions (gran) and plasminogen activator inhibitor-1 (PAI1), and four epigenetic age acceleration measures of: PhenoAge, GrimAge, HannumAge, and Intrinsic Epigenetic Age Acceleration (IEAA).
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset is the supporting data for the paper Underneath Social Media Texts: Sentiment Responses to Public Health Emergency During 2022 COVID-19 Pandemic in China.This dataset is mainly used to analyze the data of weibo text and perform sentiment analysis. The data were obtained from Weibo, and the texts were crawled using a Python tool: Weibo crawler tool. The data contains time, text content, user address, etc. Subsequently, Cleaned weibo data was obtained after cleaning operation in Excel. According to the improved Chinese sentiment lexicon, the sentiment analysis tool was used to analyze the text for sentiment analysis, to derive the main sentiment and sentiment scores, and the result file is Sentiment analysis results. Finally, ADF and KPSS analysis tools were used to analyze the stability of sentiment scores in different cities.The weibo text and sentiment analysis results data in the dataset are in .xlsx format, and the rest of the tools are Python code.Crawled data is limited by time, specific search terms and other restrictions, different operation time and terms may lead to differences in the data.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
This Mandarin Chinese Call Center Speech Dataset for the Healthcare industry is purpose-built to accelerate the development of Mandarin speech recognition, spoken language understanding, and conversational AI systems. With 30 Hours of unscripted, real-world conversations, it delivers the linguistic and contextual depth needed to build high-performance ASR models for medical and wellness-related customer service.
Created by FutureBeeAI, this dataset empowers voice AI teams, NLP researchers, and data scientists to develop domain-specific models for hospitals, clinics, insurance providers, and telemedicine platforms.
The dataset features 30 Hours of dual-channel call center conversations between native Mandarin Chinese speakers. These recordings cover a variety of healthcare support topics, enabling the development of speech technologies that are contextually aware and linguistically rich.
The dataset spans inbound and outbound calls, capturing a broad range of healthcare-specific interactions and sentiment types (positive, neutral, negative).
These real-world interactions help build speech models that understand healthcare domain nuances and user intent.
Every audio file is accompanied by high-quality, manually created transcriptions in JSON format.
Each conversation and speaker includes detailed metadata to support fine-tuned training and analysis.
This dataset can be used across a range of healthcare and voice AI use cases:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn terms of medical costs, prostate cancer is on the increase as one of the most costly cancers, posing a tremendous economic burden, but evidence on the health care utilization and medical expenditure of prostate cancer has been absent in China.ObjectiveThis study aimed to analyze health care utilization and direct medical costs of patients with prostate cancer in China.MethodsHealth care service data with a national representative sample of basic medical insurance beneficiaries between 2015 and 2017 were obtained from the China Health Insurance Association database. We conducted descriptive and statistical analyses of health care utilization, annual direct medical costs, and composition based on cancer-related medical records. Health care utilization was measured by the number of hospital visits and the length of stay.ResultsA total of 3,936 patients with prostate cancer and 24,686 cancer-related visits between 2015 and 2017 were identified in the database. The number of annual outpatient and inpatient visits per patient differed significantly from 2015 to 2017. There was no obvious change in length of stay and annual direct medical costs from 2015 to 2017. The number of annual visits per patient (outpatient: 3.0 vs. 4.0, P < 0.01; inpatient: 1.5 vs. 2.0, P < 0.001) and the annual medical direct costs per patient (US$2,300.1 vs. US$3,543.3, P < 0.001) of patients covered by the Urban Rural Resident Basic Medical Insurance (URRBMI) were both lower than those of patients covered by the Urban Employee Basic Medical Insurance (UEBMI), and the median out-of-pocket expense of URRBMI was higher than that of UEBMI (US$926.6 vs. US$594.0, P < 0.001). The annual direct medical costs of patients with prostate cancer in Western regions were significantly lower than those of patients in Eastern and Central regions (East: US$4011.9; Central: US$3458.6; West: US$2115.5) (P < 0.001).ConclusionsThere was an imbalanced distribution of health care utilization among regions in China. The direct medical costs of Chinese patients with prostate cancer remained stable, but the gap in health care utilization and medical costs between two different insurance schemes and among regions still needed to be further addressed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides monthly gridded Air Quality Index (AQI) data covering the entire territory of China from 2000 to 2020, with a spatial resolution of 1 km. The data were generated to support research on the associations between long-term/seasonal air pollution exposure and cardiovascular disease (CVD) risk in Chinese older adults (aged ≥65 years), as part of a study using the China Health and Retirement Longitudinal Study (CHARLS, 2011–2020) cohort. It captures fine-scale spatial and temporal variations in air quality across China, enabling precise linking of environmental exposure to individual health outcomes. China’s national standard (GB 3095–2018) as the maximum index among six criteria pollutants (PM₂.₅, PM₁₀, SO₂, CO, NO₂, O₃). Eighteen predictors were integrated to ensure accuracy, including meteorological variables (e.g., 2-m air temperature, 10-m wind speed from the China Meteorological Forcing Dataset), vegetation metrics (Normalized Difference Vegetation Index [NDVI], Net Primary Productivity [NPP]), anthropogenic factors (downscaled GDP, population density, Human Footprint Index), and soil properties (pH, soil organic carbon from China’s High-Resolution National Soil Information Grid). Four tree-based ensemble algorithms (Random Forest [RF], Gradient Boosting Machine [GBM], CatBoost, XGBoost) were compared, with the RF model selected as optimal (test set: R² = 0.83, Root Mean Square Error [RMSE] = 10.25, Mean Absolute Error [MAE] = 9.03) after validation via 10-fold geographic stratified cross-validation and 100 bootstrap iterations; Recursive Feature Elimination (RFE) further refined 14 core predictors to minimize overfitting. The dataset is provided as NCnet files (252 total, one per month) covering China (80°E–135°E, 15°N–53°N).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ebitda Time Series for China Reform Health Management and Services Group Co Ltd. China Reform Health Management and Services Group Co., Ltd. offers medical insurance management services in China. The company is involved in business that covers 177 medical insurance in 25 provinces. It is also involved in the pharmaceutical and medical business. The company was formerly known as SeaRainbow Holding Corp. and changed its name to China Reform Health Management and Services Group Co., Ltd. in May 2018. China Reform Health Management and Services Group Co., Ltd. was founded in 1987 and is based in Beijing, China.
Facebook
TwitterThe empirical datasets in this paper were obtained from two databases, Chinese General Social Survey (CGSS) and the China premium database of CEIC. The datasets of CGSS are initiated by the National Survey Research Center of Renmin University of China, and has been implemented every one to two years since 2003, with the most recent year being 2015. The empirical study in this paper will select survey data for three years, 2012, 2013, and 2015, which capture the period of rapid house price increase in China. Meanwhile, the datasets of CGSS are high-quality cross-sectional data, which not only contain rich information on demographics, income (individual and household), housing and marriage perceptions, but also cover rich information on individual health status, such as self-rated physical health, height and weight (used to calculate BMI), which is also of interest in our paper. In addition, it includes subjective social status, mental health status, and health-related behaviors for the mechanistic analysis in this paper.
Facebook
TwitterThis file provides a minimal, anonymized dataset for the replication of the primary statistical analyses in the manuscript titled, “Travel burden increases the risk of advanced stage at diagnosis of Breast Cancer in Kashgar, China.” The data were sourced from a retrospective study cohort at the Breast Cancer Center at the First People's Hospital of Kashgar (FPHK), Xinjiang, China. To protect patient confidentiality, this dataset has been fully anonymized. All direct identifiers have been removed. Each row in this dataset represents a single, anonymized patient.
Facebook
TwitterThe global surge in depression rates, notably severe in China with over 95 million affected, underscores a dire public health issue. This is exacerbated by a critical shortfall in mental health professionals, highlighting an urgent call for innovative approaches. The advancement of Artificial Intelligence (AI), particularly Large Language Models, offers a promising solution by improving mental health diagnostics. However, there is a lack of real data for reliable training and accurate evaluation of AI models. To this end, this paper presents a high-quality multimodal depression consultation dataset, namely Parallel Data of Depression Consultation and Hamilton Depression Rating Scale (PDCH). The dataset is constructed based on clinical consultations from Beijing Anding Hospital, which provides audio recording and transcribed text, as well as corresponding HAMD-17 scales annotated by professionals. The dataset contains 100 consultations and the audio exceeds 2,937 minutes.Each of them is about 30-min long with more than 150 dialogue turns. It enables to fill the gap in mental health services and benefit the creation of more accurate AI models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Primary health care (PHC) services are underused due to the unbalanced distribution of medical resources. This is especially true in developing countries where the construction of PHC systems has begun to take effect. Social capital is one of the important factors affecting primary health care utilization.Method: This study investigated the utilization of PHC services by Chinese community residents in the past year. Social capital, PHC utilization, age, health care insurance, etc., were measured. A multilevel negative binomial model was adopted to analyze the association of social capital with PHC utilization.Results: Data of 5,471 residents from 283 communities in China were collected through a questionnaire survey in 2018. The results showed that community social capital (CSC) is significantly associated with PHC utilization in China, but individual social capital (ISC) had no significant association with PHC utilization. A one-standard deviation increase in the CSC leads to a 1.9% increase in PHC utilization. Other factors like gender, education, income, health insurance, health status, etc., are significantly associated with PHC utilization in China.Conclusions: Community social capital plays a more important role in promoting PHC utilization, while ISC plays an unclear role in PHC utilization by the residents of China.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: The COVID-19 outbreak in China has created multiple stressors that threaten individuals' mental health, especially among public health workers (PHW) who are devoted to COVID-19 control and prevention work. This study aimed to investigate the prevalence of mental help-seeking and associated factors among PHW using Andersen's Behavioral Model of Health Services Use (BMHSU).Methods: A cross-sectional survey was conducted among 9,475 PHW in five provinces across China between February 18 and March 1, 2020. The subsample data of those who reported probable mental health problems were analyzed for this report (n = 3,417). Logistic and hierarchical regression analyses were conducted to examine the associations of predisposing, enabling, need, and COVID-19 contextual factors with mental health help-seeking.Results: Only 12.7% of PHW reported professional mental help-seeking during the COVID-19 outbreak. PHW who were older, had more days of overnight work, received psychological training, perceived a higher level of support from the society, had depression and anxiety were more likely to report mental help-seeking (ORm range: 1.02–1.73, all p < 0.05) while those worked in Centers for Disease Control and Prevention were less likely to seek help (ORm = 0.57, p < 0.01). The belief that mental health issues were not the priority (64.4%), lack of time (56.4%), and shortage of psychologists (32.7%) were the most frequently endorsed reasons for not seeking help.Conclusions: The application of BMHSU confirmed associations between some factors and PHW's mental health help-seeking. Effective interventions are warranted to promote mental health help-seeking of PHW to ameliorate the negative impact of mental illness and facilitate personal recovery and routine work.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides an in-depth look at heart attack risk factors among individuals in China, reflecting variations in healthcare access, lifestyle choices, air pollution exposure, and regional disparities. The data includes key variables such as age, gender, smoking habits, blood pressure, cholesterol levels, and previous heart attack history.
Given the urban-rural healthcare divide and the impact of environmental factors, this dataset is ideal for predictive modeling, risk assessment, and epidemiological studies related to cardiovascular disease.
Key highlights: ✅ Regional Variability: Provinces across China are represented, considering different healthcare infrastructures. ✅ Major Risk Factors: Smoking, air pollution, diet, and stress levels are included. ✅ Healthcare Accessibility: Differentiates urban vs. rural healthcare conditions. ✅ Heart Attack Prediction: Can be used to develop predictive models for heart disease.
Columns to Include: Patient_ID (Unique Identifier) Age (Numerical) Gender (Male/Female) Smoking_Status (Smoker/Non-Smoker) Hypertension (Yes/No) Diabetes (Yes/No) Obesity (Yes/No) Cholesterol_Level (High/Normal/Low) Air_Pollution_Exposure (Low/Medium/High) Physical_Activity (Low/Medium/High) Diet_Score (Healthy/Moderate/Poor) Stress_Level (Low/Medium/High) Alcohol_Consumption (Yes/No) Family_History_CVD (Yes/No) Healthcare_Access (Good/Moderate/Poor) Rural_or_Urban (Rural/Urban) Region (Eastern/Western/Northern/Southern/Central) Province (e.g., Beijing, Shanghai, Gansu, etc.) Hospital_Availability (High/Medium/Low) TCM_Use (Yes/No) Employment_Status (Employed/Unemployed/Retired) Education_Level (None/Primary/Secondary/Higher) Income_Level (Low/Middle/High) Blood_Pressure (Numerical) Chronic_Kidney_Disease (Yes/No) Previous_Heart_Attack (Yes/No) CVD_Risk_Score (0-100) Heart_Attack (Yes/No - Target Variable)