This dataset was created by Tawara
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Early stage (preclinical) detection of Parkinson's disease (PD) remains challenged yet is crucial to both differentiate it from other disorders and facilitate timely administration of neuroprotective treatment as it becomes available.Objective: In a cross-validation paradigm, this work focused on two binary predictive probability analyses: classification of early PD vs. controls and classification of early PD vs. SWEDD (scans without evidence of dopamine deficit). It was hypothesized that five distinct model types using combined non-motor and biomarker features would distinguish early PD from controls with > 80% cross-validated (CV) accuracy, but that the diverse nature of the SWEDD category would reduce early PD vs. SWEDD CV classification accuracy and alter model-based feature selection.Methods: Cross-sectional, baseline data was acquired from the Parkinson's Progressive Markers Initiative (PPMI). Logistic regression, general additive (GAM), decision tree, random forest and XGBoost models were fitted using non-motor clinical and biomarker features. Randomized train and test data partitions were created. Model classification CV performance was compared using the area under the curve (AUC), sensitivity, specificity and the Kappa statistic.Results: All five models achieved >0.80 AUC CV accuracy to distinguish early PD from controls. The GAM (CV AUC 0.928, sensitivity 0.898, specificity 0.897) and XGBoost (CV AUC 0.923, sensitivity 0.875, specificity 0.897) models were the top classifiers. Performance across all models was consistently lower in the early PD/SWEDD analyses, where the highest performing models were XGBoost (CV AUC 0.863, sensitivity 0.905, specificity 0.748) and random forest (CV AUC 0.822, sensitivity 0.809, specificity 0.721). XGBoost detection of non-PD SWEDD matched 1–2 years curated diagnoses in 81.25% (13/16) cases. In both early PD/control and early PD/SWEDD analyses, and across all models, hyposmia was the single most important feature to classification; rapid eye movement behavior disorder (questionnaire) was the next most commonly high ranked feature. Alpha-synuclein was a feature of import to early PD/control but not early PD/SWEDD classification and the Epworth Sleepiness scale was antithetically important to the latter but not former.Interpretation: Non-motor clinical and biomarker variables enable high CV discrimination of early PD vs. controls but are less effective discriminating early PD from SWEDD.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The study used Random Test-Split (RTS) and Cross-Validation (CV) machine learning methods to test different models to classify cattle behavior foraging behaviors states, foraging activities, posture, and activity by posture, using GPS coupled accelerometer data with 12-hour / days continuous recording observation as supporting ground truth. RTS in XGBoost performing best for general activity state classification, while CV in Random Forest excelled in more detailed foraging activities and activity-posture classifications. Key movement indicators like speed, Actindex and sensor values (x, y, and z) were vital in predicting behaviors, suggesting specific sensors for tracking behaviors of interest to ranchers. The results highlight the benefits of continuous monitoring and advanced data analysis for real-time livestock tracking, leading to better grazing management, improved animal welfare, and more sustainable land use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Early stage (preclinical) detection of Parkinson's disease (PD) remains challenged yet is crucial to both differentiate it from other disorders and facilitate timely administration of neuroprotective treatment as it becomes available.Objective: In a cross-validation paradigm, this work focused on two binary predictive probability analyses: classification of early PD vs. controls and classification of early PD vs. SWEDD (scans without evidence of dopamine deficit). It was hypothesized that five distinct model types using combined non-motor and biomarker features would distinguish early PD from controls with > 80% cross-validated (CV) accuracy, but that the diverse nature of the SWEDD category would reduce early PD vs. SWEDD CV classification accuracy and alter model-based feature selection.Methods: Cross-sectional, baseline data was acquired from the Parkinson's Progressive Markers Initiative (PPMI). Logistic regression, general additive (GAM), decision tree, random forest and XGBoost models were fitted using non-motor clinical and biomarker features. Randomized train and test data partitions were created. Model classification CV performance was compared using the area under the curve (AUC), sensitivity, specificity and the Kappa statistic.Results: All five models achieved >0.80 AUC CV accuracy to distinguish early PD from controls. The GAM (CV AUC 0.928, sensitivity 0.898, specificity 0.897) and XGBoost (CV AUC 0.923, sensitivity 0.875, specificity 0.897) models were the top classifiers. Performance across all models was consistently lower in the early PD/SWEDD analyses, where the highest performing models were XGBoost (CV AUC 0.863, sensitivity 0.905, specificity 0.748) and random forest (CV AUC 0.822, sensitivity 0.809, specificity 0.721). XGBoost detection of non-PD SWEDD matched 1–2 years curated diagnoses in 81.25% (13/16) cases. In both early PD/control and early PD/SWEDD analyses, and across all models, hyposmia was the single most important feature to classification; rapid eye movement behavior disorder (questionnaire) was the next most commonly high ranked feature. Alpha-synuclein was a feature of import to early PD/control but not early PD/SWEDD classification and the Epworth Sleepiness scale was antithetically important to the latter but not former.Interpretation: Non-motor clinical and biomarker variables enable high CV discrimination of early PD vs. controls but are less effective discriminating early PD from SWEDD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Combinatorial drug therapy can improve the therapeutic effect and reduce the corresponding adverse events. In silico strategies to classify synergistic vs. antagonistic drug pairs is more efficient than experimental strategies. However, most of the developed methods have been applied only to cancer therapies. In this study, we introduce a novel method, XGBoost, based on five features of drugs and biomolecular networks of their targets, to classify synergistic vs. antagonistic drug combinations from different drug categories. We found that XGBoost outperformed other classifiers in both stratified fivefold cross-validation (CV) and independent validation. For example, XGBoost achieved higher predictive accuracy than other models (0.86, 0.78, 0.78, and 0.83 for XGBoost, logistic regression, naïve Bayesian, and random forest, respectively) for an independent validation set. We also found that the five-feature XGBoost model is much more effective at predicting combinatorial therapies that have synergistic effects than those with antagonistic effects. The five-feature XGBoost model was also validated on TCGA data with accuracy of 0.79 among the 61 tested drug pairs, which is comparable to that of DeepSynergy. Among the 14 main anatomical/pharmacological groups classified according to WHO Anatomic Therapeutic Class, for drugs belonging to five groups, their prediction accuracy was significantly increased (odds ratio < 1) or reduced (odds ratio > 1) (Fisher’s exact test, p < 0.05). This study concludes that our five-feature XGBoost model has significant benefits for classifying synergistic vs. antagonistic drug combinations.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim This study uses a novel modeling approach to understand global trophic structure transformations under 21st-century climate changes. The goal is to project and understand the impacts of climate change on trophic dynamics, guiding future research and conservation efforts. Location 14,520 terrestrial grid cells of 1° x 1° globally. Taxon Trophic structures were assessed for 15,265 species, including 9,993 non-marine birds and 5,272 terrestrial mammals, across 9 predefined trophic guilds. Methods A spatially explicit community trophic structure model, based on an extreme gradient boosting algorithm (Xgboost), was used. The model was trained with 1961-1990 climatic data and projected changes according to three Shared Socioeconomic Pathways: SSP2-45, SSP3-70, and SSP5-85. Results The Xgboost model showed high predictive accuracy (86%, kappa=0.91). Projections indicated many global regions are transitioning in their trophic structures due to climate changes from 1990 to 2018, with decreases in species carrying capacity in 5.5% of cells and increases in 9.8%. Predictions for mid- and late-21st century under climate scenarios suggest significant reorganization, with notable impacts in regions such as the Amazon Basin, Central Africa, and Southeast Asia. Under SSP5-85, 17.1% of cells may face reductions in carrying capacity, while 41.1% could see increases, affecting thousands of species. Main conclusions Climate change is profoundly reorganizing global trophic communities, with significant shifts in species carrying capacity across different guilds. Tropical regions and high northern latitudes are most affected, with some species facing collapses and others finding new opportunities. These changes highlight the need to integrate community trophic structure models into biodiversity conservation strategies, offering a comprehensive view of climate change impacts on trophic networks. Methods Data Collection Species Distribution Data Geographical data were garnered from two primary sources and subsequently plotted on a global terrestrial grid, with each cell measuring 1 × 1°. These sources included the global distribution ranges of terrestrial mammals and non-marine birds. The distributions of species, specifically 9,993 non-marine birds and 5,272 terrestrial mammals, totaling 15,265 species, were informed by the IUCN Global Assessment's data on native ranges (IUCN, 2014). To enable analysis, a presence/absence matrix was created. In this matrix, the species were aligned as columns, each named, against 14,498 terrestrial grid cells, each cell measuring 1 × 1°, as rows. These include all the non-coastal cells of the world, excluding Antarctica and some northern regions, such as most of Greenland, for which some data are lacking. This approach provided a clear, granular view of species distribution across the globe. Bioclimatic Variables The bioclimatic variables were divided into two datasets: historical (1961-2018) and future (2021-2100). Historical bioclimatic variables were not obtained directly but derived from three monthly meteorological variables: mean minimum temperature (°C), mean maximum temperature (°C), and total precipitation (mm). These variables were downscaled from CRU-TS-4.03 (Harris et al., 2014) with WorldClim 2.1 (Fick & Hijmans, 2017) for bias correction. The nineteen WorldClim variables were calculated from these three monthly meteorological variables using the "biovars" function of the R dismo package (Hijmans et al., 2011). Unlike the historical data, pre-processed bioclimatic variables for the future could be accessed directly. We used a multimodel ensemble approach, which tends to perform better than any individual model (Pierce et al., 2009; Araújo & New, 2007). The ensemble integrates mean outputs from 25 global climate models (GCMs) corresponding to an array of twelve different future climate change scenarios (Harris et al., 2014; Fick & Hijmans, 2017). These scenarios emerge from the interplay of four specific timeframes (2021-2040, 2041-2060, 2061-2080, and 2081-2100) and three Shared Socio-economic Pathways (ssp2-45, ssp3-70, and ssp5-85) (Gidden et al., 2019). Feeding Habits Data The feeding habits of bird and mammal species were obtained from the global species-level compilation of key trophic attributes, known as Elton traits 1.0 (Wilman et al., 2014). This dataset provided essential information on the trophic roles of species, which is crucial for understanding their ecological interactions and energy flow within ecosystems. Trophic profile of the cells and structure identification Trophic profile of the cells We assigned each of the 15,265 terrestrial mammal and non-marine bird species to one of 9 trophic guilds and then counted the number of species in each guild within each cell, following a previous analysis (Mendoza & Araújo, 2022). The result is a matrix with the 9 trophic guilds as columns, 14,498 cells as rows, and values representing numbers of species. The trophic profile of every community is thus a point in a 9-dimensional ‘trophic space' defined by the number of species from each trophic guild (a vector of dimension 9). Selection of training samples From the initial set of 14,498 terrestrial grid cells, each measuring 1°×1°, a specific subset of 6,610 continental cells was selected. This subset was defined by their overlap, either partial or complete, with designated protected areas. This subset was crucial for two analytical steps: first, to decipher the community trophic structures; and second, to model the interaction between the prevailing climate and the trophic structure. Given the nature of these cells — designated as "continental protected area cells" — we assume they experience reduced human activity compared to the surrounding matrix; an assumption that may not align with reality globally, considering evidence of reduced effectiveness of protected areas in ensuring tangible protection in various parts of the tropics (Geldmann et al., 2019). Nevertheless, a working assumption is made that the trophic structures displayed within these areas likely present a closer reflection of what might be expected from an undisturbed, stable energy network (Mendoza & Araújo, 2022). Identification of the six basic trophic structures through AMD analysis We utilized AMD analysis to explore the previously described 9-dimensional 'community trophic space', defined by the number of species within each trophic guild. This analysis is rooted in computing the Average Membership Degree (AMD) of cluster elements based on their Euclidean distance to the geometric center. The primary aim of AMD analysis is to discern the presence of distinct groups within multidimensional spaces, while concurrently assessing their degree of definition and compactness. The emergence of well-defined community groups within this trophic space allows for the consideration of the identified basic trophic structures as qualitatively distinct entities (Mendoza & Araújo, 2022). We applied AMD analysis to the 6,610 continental protected area cells to confirm that the same six basic trophic structures (TS1 to TS6) identified by Mendoza & Araújo (2022) are present within this curated subset. For a more comprehensive understanding of the AMD method and its application to our dataset, readers are directed to the supplementary information of Mendoza & Araújo (2022), accessible via the following link: https://nsojournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2Fecog.06289&file=ecog12872-sup-0001-AppendixS1.pdf Climate modelling of community trophic structures Data preparation We modelled the relationship between climate and trophic structures, utilizing 19 predictors derived from historical bioclimatic data encompassing the years 1961-1990. Denoted as pre-1990 period, this phase marks a time before the significant uptick in temperatures attributable to human-induced greenhouse gas emissions. The trophic profile data, systematically assembled from faunal lists gathered over numerous decades, also hail from an era prior to this pronounced temperature increase. Therefore, these records present a fitting basis for examining the interplay between the trophic structure and the climatic conditions prevalent during the pre-1990 period. The bioclimatic variables represent conditions over specific time periods, and the corresponding trophic structure type (TS1 to TS6) is inferred as the one expected at the end of these periods. Model Implementation Using Xgboost We employed the Extreme Gradient Boosting algorithm (Xgboost) (Chen & Guestrin, 2016), using the xgboost package (Chen et al., 2023), a state-of-the-art machine learning technique known for its superior performance over traditional models such as random forests (e.g., Shao et al., 2024). The target variable in our analysis was the basic type of trophic structure (TS1 to TS6), identified in the previous step (with the AMD analysis) in the 6,610 continental protected area cells. Hyperparameter optimization Before training the model, we optimized the hyperparameters of the Xgboost algorithm to enhance its performance. Specifically, we focused on six parameters: learning rate, maximum tree depth, gamma, lambda, alpha, and the number of trees. Due to the enormous number of possible parameter combinations, we employed a Bayesian optimization approach, which provided a more efficient search over the hyperparameter space compared to traditional grid search. As an optimization criterion, we used the xgb.cv cross-validation function within the Xgboost package, based on k-fold cross-validation. Spatial cross-validation by blocks In order to thoroughly assess the predictive accuracy of our model and address the spatial autocorrelation inherent in ecological data, we employed a rigorous Spatial Cross-Validation by Blocks method. This approach entailed partitioning the 6,610 continental protected area cells into 3,848 validation blocks,
https://www.marketresearchintellect.com/zh/privacy-policyhttps://www.marketresearchintellect.com/zh/privacy-policy
Learn more about Market Research Intellect's Python Package Software Market Report, valued at USD 700 million in 2024, and set to grow to USD 1.5 billion by 2033 with a CAGR of 9.5% (2026-2033).
https://www.marketresearchintellect.com/fr/privacy-policyhttps://www.marketresearchintellect.com/fr/privacy-policy
La taille et la part de marché sont classées selon Data Analysis (NumPy, Pandas, SciPy, Dask, Vaex) and Web Development (Flask, Django, FastAPI, Pyramid, Bottle) and Machine Learning (TensorFlow, Scikit-learn, Keras, PyTorch, XGBoost) and Visualization (Matplotlib, Seaborn, Plotly, Bokeh, Altair) and Automation and Scripting (Requests, Beautiful Soup, Selenium, PyAutoGUI, Fabric) and régions géographiques (Amérique du Nord, Europe, Asie-Pacifique, Amérique du Sud, Moyen-Orient et Afrique).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionSemen quality has decreased gradually in recent years, and lifestyle changes are among the primary causes for this issue. Thus far, the specific lifestyle factors affecting semen quality remain to be elucidated.Materials and methodsIn this study, data on the following factors were collected from 5,109 men examined at our reproductive medicine center: 10 lifestyle factors that potentially affect semen quality (smoking status, alcohol consumption, staying up late, sleeplessness, consumption of pungent food, intensity of sports activity, sedentary lifestyle, working in hot conditions, sauna use in the last 3 months, and exposure to radioactivity); general factors including age, abstinence period, and season of semen examination; and comprehensive semen parameters [semen volume, sperm concentration, progressive and total sperm motility, sperm morphology, and DNA fragmentation index (DFI)]. Then, machine learning with the XGBoost algorithm was applied to establish a primary prediction model by using the collected data. Furthermore, the accuracy of the model was verified via multiple logistic regression following k-fold cross-validation analyses.ResultsThe results indicated that for semen volume, sperm concentration, progressive and total sperm motility, and DFI, the area under the curve (AUC) values ranged from 0.648 to 0.697, while the AUC for sperm morphology was only 0.506. Among the 13 factors, smoking status was the major factor affecting semen volume, sperm concentration, and progressive and total sperm motility. Age was the most important factor affecting DFI. Logistic combined with cross-validation analysis revealed similar results. Furthermore, it showed that heavy smoking (>20 cigarettes/day) had an overall negative effect on semen volume and sperm concentration and progressive and total sperm motility (OR = 4.69, 6.97, 11.16, and 10.35, respectively), while age of >35 years was associated with increased DFI (OR = 5.47).ConclusionThe preliminary lifestyle-based model developed for semen quality prediction by using the XGBoost algorithm showed potential for clinical application and further optimization with larger training datasets.
IntroductionSemen quality has decreased gradually in recent years, and lifestyle changes are among the primary causes for this issue. Thus far, the specific lifestyle factors affecting semen quality remain to be elucidated.Materials and methodsIn this study, data on the following factors were collected from 5,109 men examined at our reproductive medicine center: 10 lifestyle factors that potentially affect semen quality (smoking status, alcohol consumption, staying up late, sleeplessness, consumption of pungent food, intensity of sports activity, sedentary lifestyle, working in hot conditions, sauna use in the last 3 months, and exposure to radioactivity); general factors including age, abstinence period, and season of semen examination; and comprehensive semen parameters [semen volume, sperm concentration, progressive and total sperm motility, sperm morphology, and DNA fragmentation index (DFI)]. Then, machine learning with the XGBoost algorithm was applied to establish a primary prediction model by using the collected data. Furthermore, the accuracy of the model was verified via multiple logistic regression following k-fold cross-validation analyses.ResultsThe results indicated that for semen volume, sperm concentration, progressive and total sperm motility, and DFI, the area under the curve (AUC) values ranged from 0.648 to 0.697, while the AUC for sperm morphology was only 0.506. Among the 13 factors, smoking status was the major factor affecting semen volume, sperm concentration, and progressive and total sperm motility. Age was the most important factor affecting DFI. Logistic combined with cross-validation analysis revealed similar results. Furthermore, it showed that heavy smoking (>20 cigarettes/day) had an overall negative effect on semen volume and sperm concentration and progressive and total sperm motility (OR = 4.69, 6.97, 11.16, and 10.35, respectively), while age of >35 years was associated with increased DFI (OR = 5.47).ConclusionThe preliminary lifestyle-based model developed for semen quality prediction by using the XGBoost algorithm showed potential for clinical application and further optimization with larger training datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fit statistics for scored XGBoost models with 50,000 rows per dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundColon cancer recurrence is a common adverse outcome for patients after complete mesocolic excision (CME) and greatly affects the near-term and long-term prognosis of patients. This study aimed to develop a machine learning model that can identify high-risk factors before, during, and after surgery, and predict the occurrence of postoperative colon cancer recurrence.MethodsThe study included 1187 patients with colon cancer, including 110 patients who had recurrent colon cancer. The researchers collected 44 characteristic variables, including patient demographic characteristics, basic medical history, preoperative examination information, type of surgery, and intraoperative information. Four machine learning algorithms, namely extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and k-nearest neighbor algorithm (KNN), were used to construct the model. The researchers evaluated the model using the k-fold cross-validation method, ROC curve, calibration curve, decision curve analysis (DCA), and external validation.ResultsAmong the four prediction models, the XGBoost algorithm performed the best. The ROC curve results showed that the AUC value of XGBoost was 0.962 in the training set and 0.952 in the validation set, indicating high prediction accuracy. The XGBoost model was stable during internal validation using the k-fold cross-validation method. The calibration curve demonstrated high predictive ability of the XGBoost model. The DCA curve showed that patients who received interventional treatment had a higher benefit rate under the XGBoost model. The external validation set’s AUC value was 0.91, indicating good extrapolation of the XGBoost prediction model.ConclusionThe XGBoost machine learning algorithm-based prediction model for colon cancer recurrence has high prediction accuracy and clinical utility.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveNeurological deterioration after mild traumatic brain injury (TBI) has been recognized as a poor prognostic factor. Early detection of neurological deterioration would allow appropriate monitoring and timely therapeutic interventions to improve patient outcomes. In this study, we developed a machine learning model to predict the occurrence of neurological deterioration after mild TBI using information obtained on admission.MethodsThis was a retrospective cohort study of data from the Think FAST registry, a multicenter prospective observational study of elderly TBI patients in Japan. Patients with an admission Glasgow Coma Scale (GCS) score of 12 or below or who underwent surgical treatment immediately upon admission were excluded. Neurological deterioration was defined as a decrease of 2 or more points from a GCS score of 13 or more within 24 h of hospital admission. The model predictive accuracy was judged with the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC), and the Youden index was used to determine the cutoff value.ResultsA total of 421 of 721 patients registered in the Think FAST registry between December 2019 and May 2021 were included in our study, among whom 25 demonstrated neurological deterioration. Among several machine learning algorithms, eXtreme Gradient Boosting (XGBoost) demonstrated the highest predictive accuracy in cross-validation, with an AUROC of 0.81 (±0.07) and an AUPRC of 0.33 (±0.08). Through SHapley Additive exPlanations (SHAP) analysis, five important features (D-dimer, fibrinogen, acute subdural hematoma thickness, cerebral contusion size, and systolic blood pressure) were identified and used to construct a better performing model (cross-validation AUROC of 0.84 and AUPRC of 0.34; testing data AUROC of 0.77 and AUPRC of 0.19). At the cutoff value from the Youden index, the model showed a sensitivity, specificity, and positive predictive value of 60, 96, and 38%, respectively. When neurosurgeons attempted to predict neurological deterioration using the same testing data, their values were 20, 94, and 19%, respectively.ConclusionIn this study, our predictive model showed an acceptable performance in detecting neurological deterioration after mild TBI. Further validation through prospective studies is necessary to confirm these results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The five cross-validation stages involved in the present study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The uniaxial compressive strength (UCS) and elasticity modulus (E) of intact rock are two fundamental requirements in engineering applications. These parameters can be measured either directly from the uniaxial compressive strength test or indirectly by using soft computing predictive models. In the present research, the UCS and E of intact carbonate rocks have been predicted by introducing two stacking ensemble learning models from non-destructive simple laboratory test results. For this purpose, dry unit weight, porosity, P‐wave velocity, Brinell surface harnesses, UCS, and static E were measured for 70 carbonate rock samples. Then, two stacking ensemble learning models were developed for estimating the UCS and E of the rocks. The applied stacking ensemble learning method integrates the advantages of two base models in the first level, where base models are multi-layer perceptron (MLP) and random forest (RF) for predicting UCS, and support vector regressor (SVR) and extreme gradient boosting (XGBoost) for predicting E. Grid search integrating k-fold cross validation is applied to tune the parameters of both base models and meta-learner. The results demonstrate the generalization ability of the stacking ensemble method in the comparison of base models in the terms of common performance measures. The values of coefficient of determination (R2) obtained from the stacking ensemble are 0.909 and 0.831 for predicting UCS and E, respectively. Similarly, the stacking ensemble yielded Root Mean Squared Error (RMSE) values of 1.967 and 0.621 for the prediction of UCS and E, respectively. Accordingly, the proposed models have superiority in the comparison of SVR and MLP as single models and RF and XGBoost as two representative ensemble models. Furthermore, sensitivity analysis is carried out to investigate the impact of input parameters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Myasthenia gravis (MG) is a neuromuscular junction disease with a complex pathophysiology and clinical variation for which no clear biomarker has been discovered. We hypothesized that because changes in gut microbiome composition often occur in autoimmune diseases, the gut microbiome structures of patients with MG would differ from those without, and supervised machine learning (ML) analysis strategy could be trained using data from gut microbiota for diagnostic screening of MG. Genomic DNA from the stool samples of MG and those without were collected and established a sequencing library by constructing amplicon sequence variants (ASVs) and completing taxonomic classification of each representative DNA sequence. Four ML methods, namely least absolute shrinkage and selection operator, extreme gradient boosting (XGBoost), random forest, and classification and regression trees with nested leave-one-out cross-validation were trained using ASV taxon–based data and full ASV–based data to identify key ASVs in each data set. The results revealed XGBoost to have the best predicted performance. Overlapping key features extracted when XGBoost was trained using the full ASV–based and ASV taxon–based data were identified, and 31 high-importance ASVs (HIASVs) were obtained, assigned importance scores, and ranked. The most significant difference observed was in the abundance of bacteria in the Lachnospiraceae and Ruminococcaceae families. The 31 HIASVs were used to train the XGBoost algorithm to differentiate individuals with and without MG. The model had high diagnostic classification power and could accurately predict and identify patients with MG. In addition, the abundance of Lachnospiraceae was associated with limb weakness severity. In this study, we discovered that the composition of gut microbiomes differed between MG and non-MG subjects. In addition, the proposed XGBoost model trained using 31 HIASVs had the most favorable performance with respect to analyzing gut microbiomes. These HIASVs selected by the ML model may serve as biomarkers for clinical use and mechanistic study in the future. Our proposed ML model can identify several taxonomic markers and effectively discriminate patients with MG from those without with a high accuracy, the ML strategy can be applied as a benchmark to conduct noninvasive screening of MG.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accuracy and p-value obtained from 5-fold cross validation for three machine learning methods (xgboost, random forest and neural network) for the prediction of postmortem interval, event location and manner of death using the microbiota from all anatomic locations (ears, eyes, nose, mouth, and rectum).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundGastroparesis following complete mesocolic excision (CME) can precipitate a cascade of severe complications, which may significantly hinder postoperative recovery and diminish the patient’s quality of life. In the present study, four advanced machine learning algorithms—Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), and k-nearest neighbor (KNN)—were employed to develop predictive models. The clinical data of critically ill patients transferred to the intensive care unit (ICU) post-CME were meticulously analyzed to identify key risk factors associated with the development of gastroparesis.MethodsWe gathered 34 feature variables from a cohort of 1,097 colon cancer patients, including 87 individuals who developed gastroparesis post-surgery, across multiple hospitals, and applied a range of machine learning algorithms to construct the predictive model. To assess the model’s generalization performance, we employed 10-fold cross-validation, while the receiver operating characteristic (ROC) curve was utilized to evaluate its discriminative capacity. Additionally, calibration curves, decision curve analysis (DCA), and external validation were integrated to provide a comprehensive evaluation of the model’s clinical applicability and utility.ResultsAmong the four predictive models, the XGBoost algorithm demonstrated superior performance. As indicated by the ROC curve, XGBoost achieved an area under the curve (AUC) of 0.939 in the training set and 0.876 in the validation set, reflecting exceptional predictive accuracy. Notably, in the k-fold cross-validation, the XGBoost model exhibited robust consistency across all folds, underscoring its stability. The calibration curve further revealed a favorable concordance between the predicted probabilities and the actual outcomes of the XGBoost model. Additionally, the DCA highlighted that patients receiving intervention under the XGBoost model experienced significantly greater clinical benefit.ConclusionThe onset of postoperative gastroparesis in colon cancer patients remains an elusive challenge to entirely prevent. However, the prediction model developed in this study offers valuable assistance to clinicians in identifying key high-risk factors for gastroparesis, thereby enhancing the quality of life and survival outcomes for these patients.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MCC as the object of difference analysis: 10-fold cross-validation classification metrics of the top three genes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Machine learning models have been developed for numerous medical prognostic purposes. These models are commonly developed using data from single centers or regional registries. Including data from multiple centers improves robustness and accuracy of prognostic models. However, data sharing between multiple centers is complex, mainly because of regulations and patient privacy issues.Objective: We aim to overcome data sharing impediments by using distributed ML and local learning followed by model integration. We applied these techniques to develop 1-year TAVI mortality estimation models with data from two centers without sharing any data.Methods: A distributed ML technique and local learning followed by model integration was used to develop models to predict 1-year mortality after TAVI. We included two populations with 1,160 (Center A) and 631 (Center B) patients. Five traditional ML algorithms were implemented. The results were compared to models created individually on each center.Results: The combined learning techniques outperformed the mono-center models. For center A, the combined local XGBoost achieved an AUC of 0.67 (compared to a mono-center AUC of 0.65) and, for center B, a distributed neural network achieved an AUC of 0.68 (compared to a mono-center AUC of 0.64).Conclusion: This study shows that distributed ML and combined local models techniques, can overcome data sharing limitations and result in more accurate models for TAVI mortality estimation. We have shown improved prognostic accuracy for both centers and can also be used as an alternative to overcome the problem of limited amounts of data when creating prognostic models.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by Tawara
Released under CC0: Public Domain