13 datasets found
  1. f

    Apriori algorithm-based association rules.

    • plos.figshare.com
    bin
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Apriori algorithm-based association rules. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t001
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, the prevalence of T2DM has been increasing annually, in particular, the personal and socioeconomic burden caused by multiple complications has become increasingly serious. This study aimed to screen out the high-risk complication combination of T2DM through various data mining methods, establish and evaluate a risk prediction model of the complication combination in patients with T2DM. Questionnaire surveys, physical examinations, and biochemical tests were conducted on 4,937 patients with T2DM, and 810 cases of sample data with complications were retained. The high-risk complication combination was screened by association rules based on the Apriori algorithm. Risk factors were screened using the LASSO regression model, random forest model, and support vector machine. A risk prediction model was established using logistic regression analysis, and a dynamic nomogram was constructed. Receiver operating characteristic (ROC) curves, harrell’s concordance index (C-Index), calibration curves, decision curve analysis (DCA), and internal validation were used to evaluate the differentiation, calibration, and clinical applicability of the models. This study found that patients with T2DM had a high-risk combination of lower extremity vasculopathy, diabetic foot, and diabetic retinopathy. Based on this, body mass index, diastolic blood pressure, total cholesterol, triglyceride, 2-hour postprandial blood glucose and blood urea nitrogen levels were screened and used for the modeling analysis. The area under the ROC curves of the internal and external validations were 0.768 (95% CI, 0.744−0.792) and 0.745 (95% CI, 0.669−0.820), respectively, and the C-index and AUC value were consistent. The calibration plots showed good calibration, and the risk threshold for DCA was 30–54%. In this study, we developed and evaluated a predictive model for the development of a high-risk complication combination while uncovering the pattern of complications in patients with T2DM. This model has a practical guiding effect on the health management of patients with T2DM in community settings.

  2. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  3. f

    The hyperparameters of the apriori algorithm.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saeyeon Cheon; Thanin Methiyothin; Insung Ahn (2023). The hyperparameters of the apriori algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0282119.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Saeyeon Cheon; Thanin Methiyothin; Insung Ahn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The hyperparameters of the apriori algorithm.

  4. A

    ‘Groceries dataset ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2015). ‘Groceries dataset ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-groceries-dataset-b6be/136ba9af/?iid=001-023&v=presentation
    Explore at:
    Dataset updated
    Aug 15, 2015
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Groceries dataset ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/heeraldedhia/groceries-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Association Rule Mining

    Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.

    Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.

    Details of the dataset

    The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.

    Apriori Algorithm

    Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.

    An example of Association Rules

    Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5

    Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Some important terms:

    • Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

    • Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

    • Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.

    --- Original source retains full ownership of the source dataset ---

  5. f

    Stunting final dataset.

    • plos.figshare.com
    bin
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Stunting final dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundStunting is a vital indicator of chronic undernutrition that reveals a failure to reach linear growth. Investigating growth and nutrition status during adolescence, in addition to infancy and childhood is very crucial. However, the available studies in Ethiopia have been usually focused in early childhood and they used the traditional stastical methods. Therefore, this study aimed to employ multiple machine learning algorithms to identify the most effective model for the prediction of stunting among adolescent girls in Ethiopia.MethodsA total of 3156 weighted samples of adolescent girls aged 15–19 years were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 80% and 20% of the observations were used for training, and testing the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. The performance of the predictive model was evaluated using evaluation metrics value through Python software. The synthetic minority oversampling technique was used for data balancing and Boruta algorithm was used to identify best features. Association rule mining using an Apriori algorithm was employed to generate the best rule for the association between the independent feature and the targeted feature using R software.ResultsThe random forest classifier (sensitivity = 81%, accuracy = 77%, precision = 75%, f1-score = 78%, AUC = 85%) outperformed in predicting stunting compared to other ML algorithms considered in this study. Region, poor wealth index, no formal education, unimproved toilet facility, rural residence, not used contraceptive method, religion, age, no media exposure, occupation, and having one or more children were the top attributes to predict stunting. Association rule mining was identified the top seven best rules that most frequently associated with stunting among adolescent girls in Ethiopia.ConclusionThe random forest classifier outperformed in predicting and identifying the relevant predictors of stunting. Results have shown that machine learning algorithms can accurately predict stunting, making them potentially valuable as decision-support tools for the relevant stakeholders and giving emphasis for the identified predictors could be an important intervention to halt stunting among adolescent girls.

  6. t

    Generated datasets for frequent itemset mining algorithms - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Generated datasets for frequent itemset mining algorithms - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/generated-datasets-for-frequent-itemset-mining-algorithms
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Generated datasets for frequent itemset mining algorithms Apriori, Eclat, and FP-Growth.

  7. f

    Socio-demographic characteristics among adolescent girls in Ethiopia, 2016...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie (2025). Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS. [Dataset]. http://doi.org/10.1371/journal.pone.0316452.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Alemu Birara Zemariam; Biruk Beletew Abate; Addis Wondmagegn Alamaw; Eyob shitie Lake; Gizachew Yilak; Mulat Ayele; Befkad Derese Tilahun; Habtamu Setegn Ngusie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ethiopia
    Description

    Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS.

  8. f

    The data of Apriori algorithm.

    • plos.figshare.com
    txt
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fangyuan Li; Xia Wang; Zenglei Feng; Jian Wang; Mengdi Li; Kun JIANG; Changli ZHAO (2024). The data of Apriori algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0302216.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Fangyuan Li; Xia Wang; Zenglei Feng; Jian Wang; Mengdi Li; Kun JIANG; Changli ZHAO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The real-time monitoring on the risk status of the vehicle and its driver can provide the assistance for the early detection and blocking control of single-vehicle accidents. However, complex risk coupling relationship is one of the main features of single-vehicle accidents with high mortality rate. On the basis of investigating the coupling effect among multi-risk factors and establishing a safety management database throughout the life cycle of vehicles, single-vehicle driving risk network (SVDRN) with a three-level threshold was developed, and its topology features were analyzed to assessment the importance of nodes. To avoid the one-sidedness of single indicator, the multi-attribute comprehensive evaluation model was applied to measure the comprehensive effect of characteristic indicators for nodes importance. A algorithm for real-time monitoring of vehicle driving risk status was proposed to identify key risk chains. The result revealed that improper operation, speeding, loss of vehicle control and inefficient driver management were the sequence of top four risk factors in the comprehensive evaluation result of nodes importance (mean value = 0.185, SD = 0.119). There were minor differences of 0.017 in the node importance among environmental factors, among which non-standard road alignment had the larger value. The improper operation and non-standard road alignment were the highest combination correlation of factors affecting road safety, with the support of 51.81% and the confidence of 69.35%. This identification algorithm of key risk chains that combines node importance and its risk state threshold can effectively determine the high-frequency risk transmission paths and risk factors through multi-vehicle test, providing a basis for centralization management of transport enterprises.

  9. Differences in demographic and clinical characteristics between the no case...

    • plos.figshare.com
    bin
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Differences in demographic and clinical characteristics between the no case and case groups. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t002
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Differences in demographic and clinical characteristics between the no case and case groups.

  10. f

    Student’s t test of DGAARM and Apriori.

    • plos.figshare.com
    xls
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoxuan Wu; Qiang Wen; Jun Zhu (2024). Student’s t test of DGAARM and Apriori. [Dataset]. http://doi.org/10.1371/journal.pone.0299865.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Xiaoxuan Wu; Qiang Wen; Jun Zhu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Understanding air quality requires a comprehensive understanding of its various factors. Most of the association rule techniques focuses on high frequency terms, ignoring the potential importance of low- frequency terms and causing unnecessary storage space waste. Therefore, a dynamic genetic association rule mining algorithm is proposed in this paper, which combines the improved dynamic genetic algorithm with the association rule mining algorithm to realize the importance mining of low- frequency terms. Firstly, in the chromosome coding phase of genetic algorithm, an innovative multi-information coding strategy is proposed, which selectively stores similar values of different levels in one storage unit. It avoids storing all the values at once and facilitates efficient mining of valid rules later. Secondly, by weighting the evaluation indicators such as support, confidence and promotion in association rule mining, a new evaluation index is formed, avoiding the need to set a minimum threshold for high-interest rules. Finally, in order to improve the mining performance of the rules, the dynamic crossover rate and mutation rate are set to improve the search efficiency of the algorithm. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the unit point multi-information coding strategy in reducing the rule storage air, the effectiveness of mining the rules formed by the low frequency item set, and the effectiveness of combining the rule mining algorithm with the swarm intelligence optimization algorithm in terms of search time and convergence. In the experimental stage, this paper adopts the 2016 annual air quality data set of Beijing to verify the effectiveness of the above three aspects. The unit point multi-information coding strategy reduced the rule space storage consumption by 50%, the new evaluation index can mine more interesting rules whose interest level can be up to 90%, while mining the rules formed by the lower frequency terms, and in terms of search time, we reduced it about 20% compared with some meta-heuristic algorithms, while improving convergence.

  11. f

    Number of association rules generated from the prebiotics dataset with...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande (2023). Number of association rules generated from the prebiotics dataset with various run-time thresholds. [Dataset]. http://doi.org/10.1371/journal.pone.0154493.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Disha Tandon; Mohammed Monzoorul Haque; Sharmila S. Mande
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of association rules generated using the Apriori rule mining approach on the prebiotics dataset at various values of support count and confidence thresholds. Table also depicts variations in number of rules due to adoption of various strategies that define the minimum abundance threshold for individual taxa to be considered for rule mining.

  12. f

    Table_1_Urban–Rural Differences in Patterns and Associated Factors of...

    • figshare.com
    bin
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chichen Zhang; Shujuan Xiao; Lei Shi; Yaqing Xue; Xiao Zheng; Fang Dong; Jiachi Zhang; Benli Xue; Huang Lin; Ping Ouyang (2023). Table_1_Urban–Rural Differences in Patterns and Associated Factors of Multimorbidity Among Older Adults in China: A Cross-Sectional Study Based on Apriori Algorithm and Multinomial Logistic Regression.XLS [Dataset]. http://doi.org/10.3389/fpubh.2021.707062.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Chichen Zhang; Shujuan Xiao; Lei Shi; Yaqing Xue; Xiao Zheng; Fang Dong; Jiachi Zhang; Benli Xue; Huang Lin; Ping Ouyang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Introduction: Multimorbidity has become one of the key issues in the public health sector. This study aimed to explore the urban–rural differences in patterns and associated factors of multimorbidity in China and to provide scientific reference for the development of health management strategies to reduce health inequality between urban and rural areas.Methods: A cross-sectional study, which used a multi-stage random sampling method, was conducted effectively among 3,250 participants in the Shanxi province of China. The chi-square test was used to compare the prevalence of chronic diseases among older adults with different demographic characteristics. The Apriori algorithm and multinomial logistic regression were used to explore the patterns and associated factors of multimorbidity among older adults, respectively.Results: The findings showed that 30.3% of older adults reported multimorbidity, with significantly higher proportions in rural areas. Among urban older adults, 10 binary chronic disease combinations with strong association strength were obtained. In addition, 11 binary chronic disease combinations and three ternary chronic disease combinations with strong association strength were obtained among rural older adults. In rural and urban areas, there is a large gap in patterns and factors associated with multimorbidity.Conclusions: Multimorbidity was prevalent among older adults, which patterns mainly consisted of two or three chronic diseases. The patterns and associated factors of multimorbidity varied from urban to rural regions. Expanding the study of urban–rural differences in multimorbidity will help the country formulate more reasonable public health policies to maximize the benefits of medical services for all.

  13. f

    Mining co-occurrence and sequence patterns from cancer diagnoses in New York...

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu Wang; Wei Hou; Fusheng Wang (2023). Mining co-occurrence and sequence patterns from cancer diagnoses in New York State [Dataset]. http://doi.org/10.1371/journal.pone.0194407
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yu Wang; Wei Hou; Fusheng Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    The goal of this study is to discover disease co-occurrence and sequence patterns from large scale cancer diagnosis histories in New York State. In particular, we want to identify disparities among different patient groups. Our study will provide essential knowledge for clinical researchers to further investigate comorbidities and disease progression for improving the management of multiple diseases. We used inpatient discharge and outpatient visit records from the New York State Statewide Planning and Research Cooperative System (SPARCS) from 2011-2015. We grouped each patient’s visit history to generate diagnosis sequences for seven most popular cancer types. We performed frequent disease co-occurrence mining using the Apriori algorithm, and frequent disease sequence patterns discovery using the cSPADE algorithm. Different types of cancer demonstrated distinct patterns. Disparities of both disease co-occurrence and sequence patterns were observed from patients within different age groups. There were also considerable disparities in disease co-occurrence patterns with respect to different claim types (i.e., inpatient, outpatient, emergency department and ambulatory surgery). Disparities regarding genders were mostly found where the cancer types were gender specific. Supports of most patterns were usually higher for males than for females. Compared with secondary diagnosis codes, primary diagnosis codes can convey more stable results. Two disease sequences consisting of the same diagnoses but in different orders were usually with different supports. Our results suggest that the methods adopted can generate potentially interesting and clinically meaningful disease co-occurrence and sequence patterns, and identify disparities among various patient groups. These patterns could imply comorbidities and disease progressions.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang (2023). Apriori algorithm-based association rules. [Dataset]. http://doi.org/10.1371/journal.pone.0289749.t001

Apriori algorithm-based association rules.

Related Article
Explore at:
34 scholarly articles cite this dataset (View in Google Scholar)
binAvailable download formats
Dataset updated
Aug 8, 2023
Dataset provided by
PLOS ONE
Authors
Xin Luo; Jijia Sun; Hong Pan; Dian Zhou; Ping Huang; Jingjing Tang; Rong Shi; Hong Ye; Ying Zhao; An Zhang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In recent years, the prevalence of T2DM has been increasing annually, in particular, the personal and socioeconomic burden caused by multiple complications has become increasingly serious. This study aimed to screen out the high-risk complication combination of T2DM through various data mining methods, establish and evaluate a risk prediction model of the complication combination in patients with T2DM. Questionnaire surveys, physical examinations, and biochemical tests were conducted on 4,937 patients with T2DM, and 810 cases of sample data with complications were retained. The high-risk complication combination was screened by association rules based on the Apriori algorithm. Risk factors were screened using the LASSO regression model, random forest model, and support vector machine. A risk prediction model was established using logistic regression analysis, and a dynamic nomogram was constructed. Receiver operating characteristic (ROC) curves, harrell’s concordance index (C-Index), calibration curves, decision curve analysis (DCA), and internal validation were used to evaluate the differentiation, calibration, and clinical applicability of the models. This study found that patients with T2DM had a high-risk combination of lower extremity vasculopathy, diabetic foot, and diabetic retinopathy. Based on this, body mass index, diastolic blood pressure, total cholesterol, triglyceride, 2-hour postprandial blood glucose and blood urea nitrogen levels were screened and used for the modeling analysis. The area under the ROC curves of the internal and external validations were 0.768 (95% CI, 0.744−0.792) and 0.745 (95% CI, 0.669−0.820), respectively, and the C-index and AUC value were consistent. The calibration plots showed good calibration, and the risk threshold for DCA was 30–54%. In this study, we developed and evaluated a predictive model for the development of a high-risk complication combination while uncovering the pattern of complications in patients with T2DM. This model has a practical guiding effect on the health management of patients with T2DM in community settings.

Search
Clear search
Close search
Google apps
Main menu