16 datasets found
  1. m

    Data from: Classification of Heart Failure Using Machine Learning: A...

    • data.mendeley.com
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryan Chulde (2024). Classification of Heart Failure Using Machine Learning: A Comparative Study [Dataset]. http://doi.org/10.17632/959dxmgj8d.1
    Explore at:
    Dataset updated
    Oct 29, 2024
    Authors
    Bryan Chulde
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our research demonstrates that machine learning algorithms can effectively predict heart failure, highlighting high-accuracy models that improve detection and treatment. The Kaggle “Heart Failure” dataset, with 918 instances and 12 key features, was preprocessed to remove outliers and features a distribution of cases with and without heart disease (508 and 410). Five models were evaluated: the random forest achieved the highest accuracy (92%) and was consolidated as the most effective at classifying cases. Logistic regression and multilayer perceptron were also quite accurate (89%), while decision tree and k-nearest neighbors performed less well, showing that k-neighbors is less suitable for this data. F1 scores confirmed the random forest as the optimal one, benefiting from preprocessing and hyperparameter tuning. The data analysis revealed that age, blood pressure and cholesterol correlate with disease risk, suggesting that these models may help prioritize patients at risk and improve their preventive management. The research underscores the potential of these models in clinical practice to improve diagnostic accuracy and reduce costs, supporting informed medical decisions and improving health outcomes.

  2. Data from: Anonymous Data

    • kaggle.com
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Mohammed Bakhiet (2021). Anonymous Data [Dataset]. https://www.kaggle.com/alimohammedbakhiet/anonymous-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ali Mohammed Bakhiet
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The data is unknown and the columns are not clear, but it contains the target column, which is referred to as “Y”, through which you can track the binary classification process on this data. Also, the data may contain missing values, and you have the choice to get rid of the outliers or leave them as they are. You can practice machine learning algorithms or deep learning.

    Deal with the data as you see fit The important thing here is to compete for the highest accuracy for the model you are building, but you should avoid overfitting.

    Good luck with the data.

  3. n

    Malaria disease and grading system dataset from public hospitals reflecting...

    • data.niaid.nih.gov
    • datadryad.org
    • +1more
    zip
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie (2023). Malaria disease and grading system dataset from public hospitals reflecting complicated and uncomplicated conditions [Dataset]. http://doi.org/10.5061/dryad.4xgxd25gn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 10, 2023
    Dataset provided by
    Nasarawa State University
    Authors
    Temitope Olufunmi Atoyebi; Rashidah Funke Olanrewaju; N. V. Blamah; Emmanuel Chinanu Uwazie
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Malaria is the leading cause of death in the African region. Data mining can help extract valuable knowledge from available data in the healthcare sector. This makes it possible to train models to predict patient health faster than in clinical trials. Implementations of various machine learning algorithms such as K-Nearest Neighbors, Bayes Theorem, Logistic Regression, Support Vector Machines, and Multinomial Naïve Bayes (MNB), etc., has been applied to malaria datasets in public hospitals, but there are still limitations in modeling using the Naive Bayes multinomial algorithm. This study applies the MNB model to explore the relationship between 15 relevant attributes of public hospitals data. The goal is to examine how the dependency between attributes affects the performance of the classifier. MNB creates transparent and reliable graphical representation between attributes with the ability to predict new situations. The model (MNB) has 97% accuracy. It is concluded that this model outperforms the GNB classifier which has 100% accuracy and the RF which also has 100% accuracy. Methods Prior to collection of data, the researcher was be guided by all ethical training certification on data collection, right to confidentiality and privacy reserved called Institutional Review Board (IRB). Data was be collected from the manual archive of the Hospitals purposively selected using stratified sampling technique, transform the data to electronic form and store in MYSQL database called malaria. Each patient file was extracted and review for signs and symptoms of malaria then check for laboratory confirmation result from diagnosis. The data was be divided into two tables: the first table was called data1 which contain data for use in phase 1 of the classification, while the second table data2 which contains data for use in phase 2 of the classification. Data Source Collection Malaria incidence data set is obtained from Public hospitals from 2017 to 2021. These are the data used for modeling and analysis. Also, putting in mind the geographical location and socio-economic factors inclusive which are available for patients inhabiting those areas. Naive Bayes (Multinomial) is the model used to analyze the collected data for malaria disease prediction and grading accordingly. Data Preprocessing: Data preprocessing shall be done to remove noise and outlier. Transformation: The data shall be transformed from analog to electronic record. Data Partitioning The data which shall be collected will be divided into two portions; one portion of the data shall be extracted as a training set, while the other portion will be used for testing. The training portion shall be taken from a table stored in a database and will be called data which is training set1, while the training portion taking from another table store in a database is shall be called data which is training set2. The dataset was split into two parts: a sample containing 70% of the training data and 30% for the purpose of this research. Then, using MNB classification algorithms implemented in Python, the models were trained on the training sample. On the 30% remaining data, the resulting models were tested, and the results were compared with the other Machine Learning models using the standard metrics. Classification and prediction: Base on the nature of variable in the dataset, this study will use Naïve Bayes (Multinomial) classification techniques; Classification phase 1 and Classification phase 2. The operation of the framework is illustrated as follows: i. Data collection and preprocessing shall be done. ii. Preprocess data shall be stored in a training set 1 and training set 2. These datasets shall be used during classification. iii. Test data set is shall be stored in database test data set. iv. Part of the test data set must be compared for classification using classifier 1 and the remaining part must be classified with classifier 2 as follows: Classifier phase 1: It classify into positive or negative classes. If the patient is having malaria, then the patient is classified as positive (P), while a patient is classified as negative (N) if the patient does not have malaria.
    Classifier phase 2: It classify only data set that has been classified as positive by classifier 1, and then further classify them into complicated and uncomplicated class label. The classifier will also capture data on environmental factors, genetics, gender and age, cultural and socio-economic variables. The system will be designed such that the core parameters as a determining factor should supply their value.

  4. t

    Data from: Ai based 1d p & s-wave velocity models for the greater alpine...

    • service.tib.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Ai based 1d p & s-wave velocity models for the greater alpine region from local earthquake data - (outdated version) [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1942
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Abstract: The recent rapid improvement of machine learning techniques had a large impact on the way seismological data can be processed. During the last years several machine learning algorithms determining seismic onset times have been published facilitating the automatic picking of large data sets. Here we apply the deep neural network PhaseNet to a network of over 900 permanent and temporal broad band stations that were deployed as part of the AlpArray research initiative in the Greater Alpine Region (GAR) during 2016-2020. We selected 384 well distributed earthquakes with M_L >= 2.5 for our study and developed a purely data-driven pre-inversion pick selection method to consistently remove outliers from the automatic pick catalog. This allows us to include observations throughout the crustal triplication zone resulting in 39,599 P and 13,188 S observations. Using the established VELEST and the recently developed McMC codes we invert for the 1D P- and S-wave velocity structure including station correction terms while simultaneously relocating the events. As a result we present two separate models differing in the maximum included observation distance and therefore their suggested usage. The model AlpsLocPS is based on arrivals from = 10 observations per phase are included. AlpsLocPS_sta_cors.csv - File listing station data and P- & S-phase station correction terms for the "AlpsLocPS_VELEST" and "AlpsLocPS_McMC" models after relocating all events ( see Table 2 'run2' in Braszus et al., 2024 ) GAR1D_sta_cors.csv - File listing station data and P- & S-phase station correction terms for the final "GAR1D_PS_VELEST" and "GAR1D_PS_McMC" models

  5. e

    Data from: AI based 1D P & S-wave Velocity Models for the Greater Alpine...

    • b2find.eudat.eu
    Updated Jul 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). AI based 1D P & S-wave Velocity Models for the Greater Alpine Region from Local Earthquake Data - (outdated version) [Dataset]. https://b2find.eudat.eu/dataset/270fabb0-e5e5-5562-a18f-47acc58379dc
    Explore at:
    Dataset updated
    Jul 20, 2024
    Description

    The recent rapid improvement of machine learning techniques had a large impact on the way seismological data can be processed. During the last years several machine learning algorithms determining seismic onset times have been published facilitating the automatic picking of large data sets. Here we apply the deep neural network PhaseNet to a network of over 900 permanent and temporal broad band stations that were deployed as part of the AlpArray research initiative in the Greater Alpine Region (GAR) during 2016-2020. We selected 384 well distributed earthquakes with M_L >= 2.5 for our study and developed a purely data-driven pre-inversion pick selection method to consistently remove outliers from the automatic pick catalog. This allows us to include observations throughout the crustal triplication zone resulting in 39,599 P and 13,188 S observations. Using the established VELEST and the recently developed McMC codes we invert for the 1D P- and S-wave velocity structure including station correction terms while simultaneously relocating the events. As a result we present two separate models differing in the maximum included observation distance and therefore their suggested usage. The model AlpsLocPS is based on arrivals from = 10 observations per phase are included.

  6. R

    Cdd Dataset

    • universe.roboflow.com
    zip
    Updated Sep 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hakuna matata (2023). Cdd Dataset [Dataset]. https://universe.roboflow.com/hakuna-matata/cdd-g8a6g/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 5, 2023
    Dataset authored and provided by
    hakuna matata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Cumcumber Diease Detection Bounding Boxes
    Description

    Project Documentation: Cucumber Disease Detection

    1. Title and Introduction Title: Cucumber Disease Detection

    Introduction: A machine learning model for the automatic detection of diseases in cucumber plants is to be developed as part of the "Cucumber Disease Detection" project. This research is crucial because it tackles the issue of early disease identification in agriculture, which can increase crop yield and cut down on financial losses. To train and test the model, we use a dataset of pictures of cucumber plants.

    1. Problem Statement Problem Definition: The research uses image analysis methods to address the issue of automating the identification of diseases, including Downy Mildew, in cucumber plants. Effective disease management in agriculture depends on early illness identification.

    Importance: Early disease diagnosis helps minimize crop losses, stop the spread of diseases, and better allocate resources in farming. Agriculture is a real-world application of this concept.

    Goals and Objectives: Develop a machine learning model to classify cucumber plant images into healthy and diseased categories. Achieve a high level of accuracy in disease detection. Provide a tool for farmers to detect diseases early and take appropriate action.

    1. Data Collection and Preprocessing Data Sources: The dataset comprises of pictures of cucumber plants from various sources, including both healthy and damaged specimens.

    Data Collection: Using cameras and smartphones, images from agricultural areas were gathered.

    Data Preprocessing: Data cleaning to remove irrelevant or corrupted images. Handling missing values, if any, in the dataset. Removing outliers that may negatively impact model training. Data augmentation techniques applied to increase dataset diversity.

    1. Exploratory Data Analysis (EDA) The dataset was examined using visuals like scatter plots and histograms. The data was examined for patterns, trends, and correlations. Understanding the distribution of photos of healthy and ill plants was made easier by EDA.

    2. Methodology Machine Learning Algorithms:

    Convolutional Neural Networks (CNNs) were chosen for image classification due to their effectiveness in handling image data. Transfer learning using pre-trained models such as ResNet or MobileNet may be considered. Train-Test Split:

    The dataset was split into training and testing sets with a suitable ratio. Cross-validation may be used to assess model performance robustly.

    1. Model Development The CNN model's architecture consists of layers, units, and activation operations. On the basis of experimentation, hyperparameters including learning rate, batch size, and optimizer were chosen. To avoid overfitting, regularization methods like dropout and L2 regularization were used.

    2. Model Training During training, the model was fed the prepared dataset across a number of epochs. The loss function was minimized using an optimization method. To ensure convergence, early halting and model checkpoints were used.

    3. Model Evaluation Evaluation Metrics:

    Accuracy, precision, recall, F1-score, and confusion matrix were used to assess model performance. Results were computed for both training and test datasets. Performance Discussion:

    The model's performance was analyzed in the context of disease detection in cucumber plants. Strengths and weaknesses of the model were identified.

    1. Results and Discussion Key project findings include model performance and disease detection precision. a comparison of the many models employed, showing the benefits and drawbacks of each. challenges that were faced throughout the project and the methods used to solve them.

    2. Conclusion recap of the project's key learnings. the project's importance to early disease detection in agriculture should be highlighted. Future enhancements and potential research directions are suggested.

    3. References Library: Pillow,Roboflow,YELO,Sklearn,matplotlib Datasets:https://data.mendeley.com/datasets/y6d3z6f8z9/1

    4. Code Repository https://universe.roboflow.com/hakuna-matata/cdd-g8a6g

    Rafiur Rahman Rafit EWU 2018-3-60-111

  7. Data from: Toward Chemical Accuracy in Predicting Enthalpies of Formation...

    • acs.figshare.com
    xlsx
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peikun Zheng; Wudi Yang; Wei Wu; Olexandr Isayev; Pavlo O. Dral (2023). Toward Chemical Accuracy in Predicting Enthalpies of Formation with General-Purpose Data-Driven Methods [Dataset]. http://doi.org/10.1021/acs.jpclett.2c00734.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    ACS Publications
    Authors
    Peikun Zheng; Wudi Yang; Wei Wu; Olexandr Isayev; Pavlo O. Dral
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Enthalpies of formation and reaction are important thermodynamic properties that have a crucial impact on the outcome of chemical transformations. Here we implement the calculation of enthalpies of formation with a general-purpose ANI‑1ccx neural network atomistic potential. We demonstrate on a wide range of benchmark sets that both ANI-1ccx and our other general-purpose data-driven method AIQM1 approach the coveted chemical accuracy of 1 kcal/mol with the speed of semiempirical quantum mechanical methods (AIQM1) or faster (ANI-1ccx). It is remarkably achieved without specifically training the machine learning parts of ANI-1ccx or AIQM1 on formation enthalpies. Importantly, we show that these data-driven methods provide statistical means for uncertainty quantification of their predictions, which we use to detect and eliminate outliers and revise reference experimental data. Uncertainty quantification may also help in the systematic improvement of such data-driven methods.

  8. d

    Tsetse fly wing landmark data for morphometrics (Vol 20, 21)

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Geldenhuys (2024). Tsetse fly wing landmark data for morphometrics (Vol 20, 21) [Dataset]. http://doi.org/10.5061/dryad.qz612jmh1
    Explore at:
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Dylan Geldenhuys
    Time period covered
    Dec 2, 2022
    Description

    Single-wing images were captured from 14,354 pairs of field-collected tsetse wings of species Glossina pallidipes and G. m. morsitans and analysed together with relevant biological data. To answer research questions regarding these flies, we need to locate 11 anatomical landmark coordinates on each wing. The manual location of landmarks is time-consuming, prone to error, and simply infeasible given the number of images. Automatic landmark detection has been proposed to locate these landmark coordinates. We developed a two-tier method using deep learning architectures to classify images and make accurate landmark predictions. The first tier used a classification convolutional neural network to remove most wings that were missing landmarks. The second tier provided landmark coordinates for the remaining wings. For the second tier, compared direct coordinate regression using a convolutional neural network and segmentation using a fully convolutional network. For the resulting landmark pred..., This data was collected via field traps designed to catch tsetse flies. The fly wings were processed from the flies and laminated on an A4 sheet of paper along with various biological recordings from a lab dissection of the fly. This data was subsequently digitised by recording the data for each fly in excel spreadsheets. A microscope camera was used to capture a digital image of the fly wings at a resolution of 1024×1280. A subset of images was annotated and used to train machine learning models. The wing images were then given as inputs to machine learning models which located and recorded various landmarks in each fly wing image. These landmarks were appended to the dataset of biological recording taken during the lab dissection. This data was processed to remove outliers and other erroneous instances in the data set. The different files in the dataset are described below. tetse_data.csv Column names and description

    vpn: vpn is the filename, identified by the volume (v), page (p), ...,

  9. Z

    Lipidomics LC-MS analysis support tools for outlier detection

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spick, Matt (2024). Lipidomics LC-MS analysis support tools for outlier detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10889320
    Explore at:
    Dataset updated
    Mar 28, 2024
    Dataset authored and provided by
    Spick, Matt
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Identification of features with high levels of confidence in liquid chromatography-mass spectrometry (LC MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in bioinformatics work, and highlights the importance of data-driven outlier detection in assessing spectral outputs – here demonstrated using a machine learning approach based on support vector machine regression combined with leave-one-out cross validation – as well as manual curation, in order to identify software-driven errors driven by closely related lipids and by co-elution issues.

    The lipidomics case study dataset used in this work analysed a lipid extraction of a human pancreatic adenocarcinoma cell line (PANC-1, Merck, UK, cat no. 87092802) analysed using an Acquity M-Class UPLC system (Waters, UK) coupled to a ZenoToF 7600 mass spectrometer (Sciex, UK). Raw output files are included alongside processed data using MS DIAL (v4.9.221218) and Lipostar (v2.1.4) and a Jupyter notebook with Python code to analyse the outputs for outlier detection.

  10. Fast and robust deconvolution of tumor infiltrating lymphocyte from...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuning Hao; Ming Yan; Blake R. Heath; Yu L. Lei; Yuying Xie (2023). Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares [Dataset]. http://doi.org/10.1371/journal.pcbi.1006976
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yuning Hao; Ming Yan; Blake R. Heath; Yu L. Lei; Yuying Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gene-expression deconvolution is used to quantify different types of cells in a mixed population. It provides a highly promising solution to rapidly characterize the tumor-infiltrating immune landscape and identify cold cancers. However, a major challenge is that gene-expression data are frequently contaminated by many outliers that decrease the estimation accuracy. Thus, it is imperative to develop a robust deconvolution method that automatically decontaminates data by reliably detecting and removing outliers. We developed a new machine learning tool, Fast And Robust DEconvolution of Expression Profiles (FARDEEP), to enumerate immune cell subsets from whole tumor tissue samples. To reduce noise in the tumor gene expression datasets, FARDEEP utilizes an adaptive least trimmed square to automatically detect and remove outliers before estimating the cell compositions. We show that FARDEEP is less susceptible to outliers and returns a better estimation of coefficients than the existing methods with both numerical simulations and real datasets. FARDEEP provides an estimate related to the absolute quantity of each immune cell subset in addition to relative percentages. Hence, FARDEEP represents a novel robust algorithm to complement the existing toolkit for the characterization of tissue-infiltrating immune cell landscape. The source code for FARDEEP is implemented in R and available for download at https://github.com/YuningHao/FARDEEP.git.

  11. f

    Model performance for different ships.

    • plos.figshare.com
    xls
    Updated Oct 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umar Zaman; Junaid Khan; Eunkyu Lee; Awatef Salim Balobaid; R. Y. Aburasain; Kyungsup Kim (2024). Model performance for different ships. [Dataset]. http://doi.org/10.1371/journal.pone.0310385.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Umar Zaman; Junaid Khan; Eunkyu Lee; Awatef Salim Balobaid; R. Y. Aburasain; Kyungsup Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predicting ship trajectories can effectively forecast navigation trends and enable the orderly management of ships, which holds immense significance for maritime traffic safety. This paper introduces a novel ship trajectory prediction method utilizing Convolutional Neural Network (CNN), Deep Neural Network (DNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Our research comprises two main parts: the first involves preprocessing the large raw AIS dataset to extract features, and the second focuses on trajectory prediction. We emphasize a specialized preprocessing approach tailored for AIS data, including advanced filtering techniques to remove outliers and erroneous data points, and the incorporation of contextual information such as environmental conditions and ship-specific characteristics. Our deep learning models utilize trajectory data sourced from the Automatic Identification System (AIS) to train and learn regular patterns within ship trajectory data, enabling them to predict trajectories for the next hour. Experimental results reveal that CNN has substantially reduced the Mean Absolute Error (MAE) and Mean Square Error (MSE) of ship trajectory prediction, showcasing superior performance compared to other deep learning algorithms. Additionally, a comparative analysis with other models—Recurrent Neural Network (RNN), GRU, LSTM, and DBS-LSTM—using metrics such as Average Displacement Error (ADE), Final Displacement Error (FDE), and Non-Linear ADE (NL-ADE), demonstrates our method’s robustness and accuracy. Our approach not only cleans the data but also enriches it, providing a robust foundation for subsequent deep learning applications in ship trajectory prediction. This improvement effectively enhances the accuracy of trajectory prediction, promising advancements in maritime traffic safety.

  12. Data from: Urbanev: An open benchmark dataset for urban electric vehicle...

    • data.niaid.nih.gov
    • search.dataone.org
    zip
    Updated Apr 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan (2025). Urbanev: An open benchmark dataset for urban electric vehicle charging demand prediction [Dataset]. http://doi.org/10.5061/dryad.np5hqc04z
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Sun Yat-sen University
    Hong Kong Polytechnic University
    Institute of High Performance Computing
    Authors
    Han Li; Haohao Qu; Xiaojun Tan; Linlin You; Rui Zhu; Wenqi Fan
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV—an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. These factors are thoroughly analyzed qualitatively and quantitatively to reveal their correlations and causal impacts on charging behaviors. Furthermore, comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field. Methods To build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows, please see detailed descriptions. Study area and data acquisition

    Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.

    Shenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, a program was employed to extract the status (e.g., busy or idle, charging price, electricity volume, and coordinates) of each charging pile at five-minute intervals from 1 September 2022 to 28 February 2023. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data for Shenzhen city was acquired from two meteorological observatories situated in the airport and central regions, respectively. Thirdly, point of interest (POI) data was extracted, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions. Processing raw information into well-structured data To streamline the utilization of the UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors. EV charging data The raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:

    Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as "busy" or "idle"), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as "active charging" if its states at two consecutive timestamps are both "busy". Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles' rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.

    Error Detection and Imputation. Ensuring data quality is paramount when utilizing charging data for decision-making, advanced analytics, and machine-learning applications. It is crucial to address concerns around data cleanliness, as the presence of inaccuracies and inconsistencies, often referred to as dirty data, can significantly compromise the reliability and validity of any subsequent analysis or modeling efforts. To improve data quality of our charging data, several errors are identified, particularly the negative values for charging fees and the inconsistencies between the counts of occupied, idle, and total charging piles. We remove the records containing these anomalies and treat them as missing data. Besides that, a two-step imputation process was implemented to address missing values. First, forward filling replaced missing values using data from preceding timestamps. Then, backward filling was applied to fill gaps at the start of each time series. Moreover, a certain number of outliers were identified in the dataset, which could significantly impact prediction performance. To address this, the interquartile range (IQR) method was used to detect outliers for metrics including charging volume (v), charging duration (d), and the rate of active charging piles at the charging station (o). To retain more original data and minimize the impact of outlier correction on the overall data distribution, we set the coefficient to 4 instead of the default 1.5. Finally, each outlier was replaced by the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.

    Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume v and duration $(d)$ are summed within each interval (i.e., one hour), whereas the occupancy o, electricity price pe, and service price ps are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume v and duration d are cumulative, while o, pe, and ps are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of o, pe, and ps as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen. After aggregation, our aggregated dataset comprises 331 regions (also called traffic zones) with 4344 timestamps. Second, variance tests and zero-value filtering functions were employed to filter out traffic zones with zero or no change in charging data. Specifically, it means that

  13. Thyroid Disease Unsupervised Anomaly Detection

    • kaggle.com
    Updated May 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LIFR (2021). Thyroid Disease Unsupervised Anomaly Detection [Dataset]. https://www.kaggle.com/zhonglifr/thyroid-disease-unsupervised-anomaly-detection/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    LIFR
    Description

    Context

    "This is a dataset originally from the UCI Thyroid Disease Data Set. Then it was modified for unsupervised anomaly detection by Goldstein Markus et al. in 2015."

    Content

    This dataset has 16 categorical attributes, 5 numerical attributes, and 1 target attribute, then 22 attributes in total.

    1) here is the variable description for the categorical attributes: age: continuous. sex: categorical, M, F. on thyroxine: categorical, f, t. query on thyroxine: categorical, f, t. on antithyroid medication: categorical, f, t. sick: categorical, f, t. pregnant: categorical, f, t. thyroid surgery: categorical, f, t. I131 treatment: categorical, f, t. query hypothyroid: categorical, f, t. query hyperthyroid: categorical, f, t. lithium: categorical, f, t. goitre: categorical, f, t. tumor: categorical, f, t. hypopituitary: categorical, f, t. psych: categorical, f, t. For the sake of convenience, age is normalised into (0,1), all the categorical variables are mapped in the following ways: {"M" -> 0 , "F" -> 1}, or {"f" ->0, "t" -> 1}.

    2). here is the variable description for the numerical attributes: TSH: continuous. T3: continuous. TT4: continuous. T4U: continuous. FTI: continuous.

    3). here is the variable description for the target attributes: outlier_label(target): categorical, o, n. For the target attribute(Outlier_label), "o" means outlier and "n" means normal. By the way, please just remove the last empty column.

    Acknowledgements

    As stated by the original research paper [1]: "The thyroid dataset is another dataset from UCI machine learning repository in the medical domain. The raw patient measurements contain categorical attributes as well as missing values such that it was preprocessed in order to apply neural networks [2], also known as the “annthyroid” dataset. We make also use of this preprocessing, resulting in 21 dimensions. Normal instances (healthy non-hypothyroid patients) were taken from the train- ing and test datasets. From the test set, we sampled 250 outliers from the two disease classes (subnormal function and hyperfunction) resulting in a new dataset containing 6,916 records with 3.61% anomalies."

    Reference

    [1] Goldstein M, Uchida S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data[J]. PloS one, 2016, 11(4): e0152173. [2] Schiffmann W, Joost M, Werner R. Synthesis and performance analysis of multilayer neural network architectures[J]. 1992. [3] Goldstein, Markus, 2015, "annthyroid-unsupervised-ad.tab", Unsupervised Anomaly Detection Benchmark, https://doi.org/10.7910/DVN/OPQMVF/CJURKL, Harvard Dataverse, V1, UNF:6:jJUwpBJ4iBlQto8WT6zsUg== [fileUNF]

  14. r

    AI based 1D P & S-wave Velocity Models for the Greater Alpine Region from...

    • radar-service.eu
    • radar.kit.edu
    tar
    Updated Mar 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benedikt Braszus; Trond Ryberg; Andreas Rietbrock; Christian Haberland (2024). AI based 1D P & S-wave Velocity Models for the Greater Alpine Region from Local Earthquake Data - (outdated version) [Dataset]. http://doi.org/10.35097/1942
    Explore at:
    tar(6601216 bytes)Available download formats
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Rietbrock, Andreas
    Haberland, Christian
    Braszus, Benedikt
    Karlsruhe Institute of Technology
    Ryberg, Trond
    Authors
    Benedikt Braszus; Trond Ryberg; Andreas Rietbrock; Christian Haberland
    Description

    == This file is summarizing the content of the data files in this repository published together with the article:

    AI based 1D P & S-wave Velocity Models for the Greater Alpine Region from Local Earthquake Data VELOCITY FILES AlpsLocPS_VEL.mod
    - VELEST model file of 'AlpsLocPS_VELEST' (red in Fig. 6 of Braszus et al., 2024) AlpsLocPS_McMC.mod
    - McMC model of 'AlpsLocPS_McMC' (orange in Fig. 6 of Braszus et al., 2024) GAR1D_PS_VEL.mod
    - VELEST model file of 'GAR1D_PS_VELEST' (lime in Fig. 6 of Braszus et al., 2024) GAR1D_PS_McMC.mod
    - McMC model of 'GAR1D_PS_McMC' (purple in Fig. 6 of Braszus et al., 2024) STATION FILES
    Station corrections have to be substracted from the synthetic travel times ! Only stations with >= 10 observations per phase are included. AlpsLocPS_sta_cors.csv
    - File listing station data and P- & S-phase station correction terms for the "AlpsLocPS_VELEST" and "AlpsLocPS_McMC" models after relocating all events ( see Table 2 'run2' in Braszus et al., 2024 ) GAR1D_sta_cors.csv
    - File listing station data and P- & S-phase station correction terms for the final "GAR1D_PS_VELEST" and "GAR1D_PS_McMC" models EVENT FILES events_VELEST.csv - Catalog of relocated events using VELEST PICK CATALOG pick_catalog.csv The following describes the header entries of "pick_catalog.csv" in some more detail "network name,station name,station latitude,station longitude, station elevation in m, event origin time, event latitude, event longitude, event depth in km, pick_type, pick_phase, pick_time, res in s"

  15. o

    Oxytocin and behaviors during a parent-child interaction in children with...

    • osf.io
    url
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Rosa Mora-Jensen; Christine Thoustrup; Line Clemmensen; Anne Pagsberg; Nicole Lønfeldt (2024). Oxytocin and behaviors during a parent-child interaction in children with and without OCD: A statistical analysis plan [Dataset]. http://doi.org/10.17605/OSF.IO/YXKE5
    Explore at:
    urlAvailable download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    Center For Open Science
    Authors
    Anna-Rosa Mora-Jensen; Christine Thoustrup; Line Clemmensen; Anne Pagsberg; Nicole Lønfeldt
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Aim In this study, we wish to explore if the oxytocin system is connected to parent and child behaviors during parent-child interactions. We will first test specific hypotheses and then perform exploratory analyses.

    1) Hypothesis testing Rationale: Oxytocin has been suggested to be connected to parenting behaviors in mammals1 and has been associated with parental sensitivity1 and affectionate parenting behaviors in humans2. Hypotheses: - Children and parents’ oxytocin concentrations are positively associated with observed parental support sensitivity and dyadic emotional warmth. - Child oxytocin concentration is negatively associated with child reassurance seeking behavior.

    2) Exploratory analyses We will explore: - If parent and child behaviors cluster in higher-level constructs that describe the variation in children with and without OCD. - If any potential higher-level constructs are associated with parent and child oxytocin concentrations. - Which parent and child behaviors during a parent-child interaction best predict parent and child oxytocin concentrations.

    Methods We will use data from the TECTO trial that consists of 130 children with OCD and their parents and 90 sex- and aged matched controls without current or previous psychiatric disorders and their parents. Our study is a baseline investigation of oxytocin and behavioral data.

    Measures: Oxytocin: We use information from salivary oxytocin concentrations collected with Salivettes (Sarstedt, Germany) and analyzed with Enzo immunoassay oxytocin kits (NY, USA).

    Parent and child behaviors: Video-recorded parent and child behaviors were coded by investigators with the TEC-M coding scheme3. The TEC-M coding scheme is developed to assess children’s emotion regulation abilities from observations of parent and child behaviors during a frustration task where the child and parent interact. The TEC-M consists of eight codes for parent behaviors, eight codes for child behaviors and one code for dyadic parent-child behavior. Items are scored by a clinician or investigator on frequency (never, rarely, sometimes, often) and intensity (mild, moderate, marked)3. The primary outcome, the EmReg score, is scored by assigning a total score taking all coded behaviors into account. In this study we will use the individual parent and child behavior codes and not the assigned EmReg score.

    Diagnostic status: The diagnostic status of the child is available from screening with the Kiddie-Schedule for Affective Disorders and Schizophrenia – Present and Lifetime Version (K-SADS-PL)4 and the criteria for International Classification of Diseases‑10 (ICD-10)5.

    Pubertal stage: Child pubertal stage was reported by the child after instruction from a clinician using Tanner Stages6,7.

    Statistical analyses 1) Hypotheses testing The association between oxytocin concentrations and behavioral codes will be analyzed with Spearman’s (rank-order) correlations as the behaviors are scored on an ordinal scale. We will use one-sided tests. We will include child, mother, and father oxytocin concentrations as outcomes in separate models. Oxytocin concentrations will be included as a continuous variable. We will exclude outliers in oxytocin concentrations (defined as values +/- mean + 3 SD) and separately present the results including outliers in an appendix. We will include two behavioral codes as possible predictors of parent oxytocin - parental support sensitivity and dyadic emotional warmth. We will include three possible predictors of child oxytocin - parental support sensitivity, dyadic emotional warmth, and child reassurance seeking. The behavioral codes will be included as ordinal variables. In the models analyzing child oxytocin as outcome, we will include child pubertal stage as a possible covariate as this was found associated with child oxytocin in a previous study8. Pubertal stage will be included as a categorical variable. Analyses will be performed with R (R Core Team, Vienna, Austria). We will perform complete case analyses. Information on 101 mother-child interactions and 63 father-child interactions are available. We will use Benjamin Hoch bergs method to adjust for multiple testing with a false discovery rate of 5%.

    2) Exploratory analyses We will use different methods to investigate the exploratory aims in this study: - Decomposition methods like Principle Component Analysis (PCA). - Cluster analyses such as K-means clustering. - Machine learning models for predicting like Random Forest. We will test for multiple comparisons in the exploratory analysis with the Benjamin Hochbergs method with a false discovery rate of 20%.

    Results Participants were included between July 2018 and July 2022. We will start data analysis after pre-registration of the statistical analysis plan and plan to submit the results for publication in August 2024.

    TECTO trial Registration: ClinicalTrials.gov NCT03595098; https://clinicaltrials.gov/ct2/show/NCT03595098

    References 1. Feldman, R., Gordon, I. & Zagoory-Sharon, O. The cross-generation transmission of oxytocin in humans. Hormones and Behavior 58, 669–676 (2010). 2. Gordon, I., Zagoory-Sharon, O., Leckman, J. F. & Feldman, R. Oxytocin and the Development of Parenting in Humans. Biological Psychiatry 68, 377–382 (2010). 3. Hagstrøm, J. et al. The Puzzle of Emotion Regulation: Development and Evaluation of the Tangram Emotion Coding Manual for Children. Front. Psychiatry 10, 723 (2019). 4. Kaufman, J. et al. Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL): Initial Reliability and Validity Data. Journal of the American Academy of Child & Adolescent Psychiatry 36, 980–988 (1997). 5. World Health Organization. CD-10 - Psykiske Lidelser Og Adfærdsmæssige Forstyr-Relser. Klassifikation Og Diagnostiske Kriterier. 1. Munksgaard. 1994. (1994). 6. Marshall, W. A. & Tanner, J. M. Variations in pattern of pubertal changes in girls. Arch Dis Child 44, 291–303 (1969). 7. Marshall, W. A. & Tanner, J. M. Variations in the Pattern of Pubertal Changes in Boys. Arch Dis Child 45, 13–23 (1970). 8. Mora-Jensen et al. ‘Salivary oxytocin concentrations in children and adolescents with and without OCD’, Unpublished results.

  16. f

    Biological data captured in lab dissection.

    • plos.figshare.com
    xls
    Updated Jul 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan S. Geldenhuys; Shane Josias; Willie Brink; Mulanga Makhubele; Cang Hui; Pietro Landi; Jeremy Bingham; John Hargrove; Marijn C. Hazelbag (2023). Biological data captured in lab dissection. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011194.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 7, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Dylan S. Geldenhuys; Shane Josias; Willie Brink; Mulanga Makhubele; Cang Hui; Pietro Landi; Jeremy Bingham; John Hargrove; Marijn C. Hazelbag
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Morphometric analysis of wings has been suggested for identifying and controlling isolated populations of tsetse (Glossina spp), vectors of human and animal trypanosomiasis in Africa. Single-wing images were captured from an extensive data set of field-collected tsetse wings of species Glossina pallidipes and G. m. morsitans. Morphometric analysis required locating 11 anatomical landmarks on each wing. The manual location of landmarks is time-consuming, prone to error, and infeasible for large data sets. We developed a two-tier method using deep learning architectures to classify images and make accurate landmark predictions. The first tier used a classification convolutional neural network to remove most wings that were missing landmarks. The second tier provided landmark coordinates for the remaining wings. We compared direct coordinate regression using a convolutional neural network and segmentation using a fully convolutional network for the second tier. For the resulting landmark predictions, we evaluate shape bias using Procrustes analysis. We pay particular attention to consistent labelling to improve model performance. For an image size of 1024 × 1280, data augmentation reduced the mean pixel distance error from 8.3 (95% confidence interval [4.4,10.3]) to 5.34 (95% confidence interval [3.0,7.0]) for the regression model. For the segmentation model, data augmentation did not alter the mean pixel distance error of 3.43 (95% confidence interval [1.9,4.4]). Segmentation had a higher computational complexity and some large outliers. Both models showed minimal shape bias. We deployed the regression model on the complete unannotated data consisting of 14,354 pairs of wing images since this model had a lower computational cost and more stable predictions than the segmentation model. The resulting landmark data set was provided for future morphometric analysis. The methods we have developed could provide a starting point to studying the wings of other insect species. All the code used in this study has been written in Python and open sourced.

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bryan Chulde (2024). Classification of Heart Failure Using Machine Learning: A Comparative Study [Dataset]. http://doi.org/10.17632/959dxmgj8d.1

Data from: Classification of Heart Failure Using Machine Learning: A Comparative Study

Related Article
Explore at:
Dataset updated
Oct 29, 2024
Authors
Bryan Chulde
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Our research demonstrates that machine learning algorithms can effectively predict heart failure, highlighting high-accuracy models that improve detection and treatment. The Kaggle “Heart Failure” dataset, with 918 instances and 12 key features, was preprocessed to remove outliers and features a distribution of cases with and without heart disease (508 and 410). Five models were evaluated: the random forest achieved the highest accuracy (92%) and was consolidated as the most effective at classifying cases. Logistic regression and multilayer perceptron were also quite accurate (89%), while decision tree and k-nearest neighbors performed less well, showing that k-neighbors is less suitable for this data. F1 scores confirmed the random forest as the optimal one, benefiting from preprocessing and hyperparameter tuning. The data analysis revealed that age, blood pressure and cholesterol correlate with disease risk, suggesting that these models may help prioritize patients at risk and improve their preventive management. The research underscores the potential of these models in clinical practice to improve diagnostic accuracy and reduce costs, supporting informed medical decisions and improving health outcomes.

Search
Clear search
Close search
Google apps
Main menu