Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The basket dataset contains a list of items available for purchase for customers. These items can be found in sets as well. For eg. milk and sugar.
The analysis being done is to ascertain for the retailers which item or sets of items are purchased. Sometimes it so happens that the purchase of an item by the customer leads the customer to purchase another item as well. It is a sort of an association of items. This is called "Association Rule Mining".
It shows which items appear together in a transaction or relation. It’s majorly used by retailers, grocery stores, an online marketplace that has a large transactional database.
We wouldn’t want to calculate all associations between every possible combination of products. Instead, we would want to select only potentially “relevant” rules from the set of all possible rules. Therefore, we use the measures support, confidence and lift to reduce the number of relationships we need to analyze.
Support says how popular an item is, as measured in the proportion of transactions in which an item set appears.
Confidence says how likely item Y is purchased when item X is purchased, Thus it is measured by the proportion of transaction with item X in which item Y also appears (Support/Antecedent (LHS)).
Lift says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is. (Confidence/Consequent (RHS))
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Market Basket Analysis is one of the key techniques used by large retailers to uncover associations between items. It works by looking for combinations of items that occur together frequently in transactions. To put it another way, it allows retailers to identify relationships between the items that people buy.
Association Rules are widely used to analyze retail basket or transaction data and are intended to identify strong rules discovered in transaction data using measures of interestingness, based on the concept of strong rules.
The dataset has 38765 rows of the purchase orders of people from the grocery stores. These orders can be analysed and association rules can be generated using Market Basket Analysis by algorithms like Apriori Algorithm.
Apriori is an algorithm for frequent itemset mining and association rule learning over relational databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
Assume there are 100 customers 10 of them bought milk, 8 bought butter and 6 bought both of them. bought milk => bought butter support = P(Milk & Butter) = 6/100 = 0.06 confidence = support/P(Butter) = 0.06/0.08 = 0.75 lift = confidence/P(Milk) = 0.75/0.10 = 7.5
Note: this example is extremely small. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Support: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.
Lift: This says how likely item Y is purchased when item X is purchased while controlling for how popular item Y is.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Omics-wide association analysis is a very important tool for medicine and human health study. However, the modern omics data sets collected often exhibit the high-dimensionality, unknown distribution response, unknown distribution features and unknown complex association relationships between the response and its explanatory features. Reliable association analysis results depend on an accurate modeling for such data sets. Most of the existing association analysis methods rely on the specific model assumptions and lack effective false discovery rate (FDR) control. To address these limitations, the paper firstly applies a single index model for omics data. The model shows robust performance in allowing the relationships between the response variable and linear combination of covariates to be connected by any unknown monotonic link function, and both the random error and the covariates can follow any unknown distribution. Then based on this model, the paper combines rank-based approach and symmetrized data aggregation approach to develop a novel and robust feature selection method for achieving fine-mapping of risk features while controlling the false positive rate of selection. The theoretical results support the proposed method and the analysis results of simulated data show the new method possesses effective and robust performance for all the scenarios. The new method is also used to analyze the two real datasets and identifies some risk features unreported by the existing finds.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Market_Basket_Optimisation dataset is a classic transactional dataset often used in association rule mining and market basket analysis.
It consists of multiple transactions where each transaction represents the collection of items purchased together by a customer in a single shopping trip.
Market_Basket_Optimisation.csv Example transaction rows (simplified):
| Item 1 | Item 2 | Item 3 | Item 4 | ... |
|---|---|---|---|---|
| Bread | Butter | Jam | ||
| Mineral water | Chocolate | Eggs | Milk | |
| Spaghetti | Tomato sauce | Parmesan |
Here, empty cells mean no item was purchased in that slot.
This dataset is frequently used in data mining, analytics, and recommendation systems. Common applications include:
Association Rule Mining (Apriori, FP-Growth):
{Bread, Butter} ⇒ {Jam} with high support and confidence. Product Affinity Analysis:
Recommendation Engines:
Marketing Campaigns:
Inventory Management:
No Customer Identifiers:
No Timestamps:
No Quantities or Prices:
Sparse & Noisy:
Facebook
TwitterThe datasets in this zip file are in support of Intelligent Transportation Systems Joint Program Office (ITS JPO) report FHWA-JPO-16-385, "Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs — Evaluation Report for ATDM Program," https://rosap.ntl.bts.gov/view/dot/32520 and FHWA-JPO-16-373, "Analysis, modeling, and simulation (AMS) testbed development and evaluation to support dynamic mobility applications (DMA) and active transportation and demand management (ATDM) programs : Dallas testbed analysis plan," https://rosap.ntl.bts.gov/view/dot/32106 The files in this zip file are specifically related to the Dallas Testbed. The compressed zip files total 2.2 GB in size. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were converted to .pdf document files which are an open, archival format. These pdfs were then added to the zip file alongside the original .docx files. These files can be unzipped using any zip compression/decompression software. This zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .cvs text files which can be read using any text editor; .txt text files which can be read using any text editor; .docx document files which can be read in Microsoft Word and some other word processing programs; . xlsx spreadsheet files which can be read in Microsoft Excel and some other spreadsheet programs; .dat data files which may be text or multimedia; as well as GIS or mapping files in the fowlling formats: .mxd, .dbf, .prj, .sbn, .shp., .shp.xml; which may be opened in ArcGIS or other GIS software. [software requirements] These files were last accessed in 2017.
Facebook
TwitterIntroductionHispanic/Latino populations are underrepresented in Alzheimer Disease (AD) genetic studies. Puerto Ricans (PR), a three-way admixed (European, African, and Amerindian) population is the second-largest Hispanic group in the continental US. We aimed to conduct a genome-wide association study (GWAS) and comprehensive analyses to identify novel AD susceptibility loci and characterize known AD genetic risk loci in the PR population.Materials and methodsOur study included Whole Genome Sequencing (WGS) and phenotype data from 648 PR individuals (345 AD, 303 cognitively unimpaired). We used a generalized linear-mixed model adjusting for sex, age, population substructure, and genetic relationship matrix. To infer local ancestry, we merged the dataset with the HGDP/1000G reference panel. Subsequently, we conducted univariate admixture mapping (AM) analysis.ResultsWe identified suggestive signals within the SLC38A1 and SCN8A genes on chromosome 12q13. This region overlaps with an area of linkage of AD in previous studies (12q13) in independent data sets further supporting. Univariate African AM analysis identified one suggestive ancestral block (p = 7.2×10−6) located in the same region. The ancestry-aware approach showed that this region has both European and African ancestral backgrounds and both contributing to the risk in this region. We also replicated 11 different known AD loci -including APOE- identified in mostly European studies, which is likely due to the high European background of the PR population.ConclusionPR GWAS and AM analysis identified a suggestive AD risk locus on chromosome 12, which includes the SLC38A1 and SCN8A genes. Our findings demonstrate the importance of designing GWAS and ancestry-aware approaches and including underrepresented populations in genetic studies of AD.
Facebook
TwitterIntroductionMachine-assisted topic analysis (MATA) uses artificial intelligence methods to help qualitative researchers analyze large datasets. This is useful for researchers to rapidly update healthcare interventions during changing healthcare contexts, such as a pandemic. We examined the potential to support healthcare interventions by comparing MATA with “human-only” thematic analysis techniques on the same dataset (1,472 user responses from a COVID-19 behavioral intervention).MethodsIn MATA, an unsupervised topic-modeling approach identified latent topics in the text, from which researchers identified broad themes. In human-only codebook analysis, researchers developed an initial codebook based on previous research that was applied to the dataset by the team, who met regularly to discuss and refine the codes. Formal triangulation using a “convergence coding matrix” compared findings between methods, categorizing them as “agreement”, “complementary”, “dissonant”, or “silent”.ResultsHuman analysis took much longer than MATA (147.5 vs. 40 h). Both methods identified key themes about what users found helpful and unhelpful. Formal triangulation showed both sets of findings were highly similar. The formal triangulation showed high similarity between the findings. All MATA codes were classified as in agreement or complementary to the human themes. When findings differed slightly, this was due to human researcher interpretations or nuance from human-only analysis.DiscussionResults produced by MATA were similar to human-only thematic analysis, with substantial time savings. For simple analyses that do not require an in-depth or subtle understanding of the data, MATA is a useful tool that can support qualitative researchers to interpret and analyze large datasets quickly. This approach can support intervention development and implementation, such as enabling rapid optimization during public health emergencies.
Facebook
TwitterThe datasets in this zip file are in support of FHWA-JPO-16-379, Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs - calibration Report for Phoenix Testbed : Final Report. The compressed zip file totals 1.1 GB in size. The zip file have been uploaded as-is; no further documentation was supplied by NTL, excepted as noted: All located .docx files were converted to .pdf document files which are an archival format. These .pdfs were then added to the zip file alongside the original .docx files. The initial zip file presented to NTL contained uncompressed datasets and duplicative zip files of the files. In order to make the overall size of the this zip file more manageable, duplicative files were deleted. The zip file can be unzipped using any zip compression/decompression software. This zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .cvs text files which can be read using any text editor; .docx document files which can be read in Microsoft Word and some other word processing programs; .txt text files which can be opened with any text editor; .xlsx spreadsheet files which can be read in Microsoft Excel and some other spreadsheet programs; .cfg computer configuration files; .db database files, which can be opened with many database programs; .rif raster image files, these files may have been created by the Corel Painter image editing application, a proprietary software program, although other image programs may open the files [software requirements]. These files were last accessed in 2017.
Facebook
TwitterThis dataset provides the annual results, by school year, from the student surveys. The survey questions assess satisfaction with overall service for individuals who receive assistance from CARE 7 Youth Support Specialists. Students who receive services from Youth Specialists are given the opportunity to complete a survey regarding their satisfaction with the services provided. A student can complete a study every time they meet with a Youth Support Specialists. The survey is voluntary. Data DictionaryAdditional InformationSource: Department generated surveyContact: Maria GonzalezContact Email: Maria_Gonzalez@tempe.govData Source Type: Excel spreadsheetPreparation Method: Responses of "Very Satisfied" and "Satisfied" from two school districts are combined and summarized.Publish Frequency: AnnualPublish Method: Manual
Facebook
TwitterThe UCDP, Uppsala Conflict Data Program, contains information on a large number data on organised violence, armed violence, and peacemaking. There is information from 1946 up to today, and the datasets are updated continuously. The data can be downloaded for free, and available in several different versions.
The UCDP External Support Data contains information of external support in intrastate conflicts, 1975-2010. Provides information of kind of support, extern actor and specific year. The data is divided into two separate datasets which are analogous, i.e. contain identical data structured in a different manner to simplify various types of research such as different types of statistical analyses:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The preprocessed HNSCC dataset, which contains 2,000 gene expression values, the logarithm of survival time, and a censoring indicator, can also be available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection crashes recognised as "high-low" outliers within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-low" outlier road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0278, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 675 KBNumber of Files: The dataset contains a total of 10212 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-low" outlier fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-low" outlier fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-low" outlier road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variantscape dataset
LLM-based extraction of genetic variants and biomedical entities from titles and abstracts of biomedical publications. These datasets support the analysis of literature-derived co-associations between genetic variants, cancer types, and treatments, enabling downstream network analysis, hypothesis generation, and discovery in precision oncology.
1. Dataset: Cleaned literature dataset for biomedical entity extraction (2014–2024)
"cleaned_OpenAlex.csv "
A pre-processed, cleaned, and structured dataset of cancer-related biomedical publications (2014–2024) retrieved from OpenAlex, containing titles, abstracts, and metadata curated for downstream NLP and LLM-based biomedical entity extraction.
2. Dataset: Binary entity matrix for co-association and network analysis
"dataset_for_analysis.csv"
Final binary matrix dataset derived from NLP- and LLM-based entity extraction on cancer-related literature. Entities include genetic variants, cancer types, and treatments, enabling co-occurrence and network analysis, and the investigation of literature-derived co-associations.
3. Dataset: LLM-based classification of variant-treatment co-associations
"variant_treatment_relationship_consensus.csv"
Dataset capturing LLM-based classification and consensus on co-associations between genetic variants and treatments.
4. Dataset: Metadata mapping for entity extraction and analysis
"metadata_mapping_transposed.csv "
Transposed, row-indexed metadata mapping file used for identification of each column as a variant, cancer type, treatment, study design element, or publication-derived metadata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
In recent years there has been an increased interest in Artificial Intelligence for IT Operations (AIOps). This field utilizes monitoring data from IT systems, big data platforms, and machine learning to automate various operations and maintenance (O&M) tasks for distributed systems.
The major contributions have been materialized in the form of novel algorithms.
Typically, researchers took the challenge of exploring one specific type of observability data sources, such as application logs, metrics, and distributed traces, to create new algorithms.
Nonetheless, due to the low signal-to-noise ratio of monitoring data, there is a consensus that only the analysis of multi-source monitoring data will enable the development of useful algorithms that have better performance.
Unfortunately, existing datasets usually contain only a single source of data, often logs or metrics. This limits the possibilities for greater advances in AIOps research.
Thus, we generated high-quality multi-source data composed of distributed traces, application logs, and metrics from a complex distributed system. This paper provides detailed descriptions of the experiment, statistics of the data, and identifies how such data can be analyzed to support O&M tasks such as anomaly detection, root cause analysis, and remediation.
General Information:
This repository contains the simple scripts for data statistics, and link to the multi-source distributed system dataset.
You may find details of this dataset from the original paper:
Sasho Nedelkoski, Jasmin Bogatinovski, Ajay Kumar Mandapati, Soeren Becker, Jorge Cardoso, Odej Kao, "Multi-Source Distributed System Data for AI-powered Analytics".
If you use the data, implementation, or any details of the paper, please cite!
BIBTEX:
_
@inproceedings{nedelkoski2020multi,
title={Multi-source Distributed System Data for AI-Powered Analytics},
author={Nedelkoski, Sasho and Bogatinovski, Jasmin and Mandapati, Ajay Kumar and Becker, Soeren and Cardoso, Jorge and Kao, Odej},
booktitle={European Conference on Service-Oriented and Cloud Computing},
pages={161--176},
year={2020},
organization={Springer}
}
_
The multi-source/multimodal dataset is composed of distributed traces, application logs, and metrics produced from running a complex distributed system (Openstack). In addition, we also provide the workload and fault scripts together with the Rally report which can serve as ground truth. We provide two datasets, which differ on how the workload is executed. The sequential_data is generated via executing workload of sequential user requests. The concurrent_data is generated via executing workload of concurrent user requests.
The raw logs in both datasets contain the same files. If the user wants the logs filetered by time with respect to the two datasets, should refer to the timestamps at the metrics (they provide the time window). In addition, we suggest to use the provided aggregated time ranged logs for both datasets in CSV format.
Important: The logs and the metrics are synchronized with respect time and they are both recorded on CEST (central european standard time). The traces are on UTC (Coordinated Universal Time -2 hours). They should be synchronized if the user develops multimodal methods. Please read the IMPORTANT_experiment_start_end.txt file before working with the data.
Our GitHub repository with the code for the workloads and scripts for basic analysis can be found at: https://github.com/SashoNedelkoski/multi-source-observability-dataset/
Facebook
TwitterBackgroundLevels of physical activity (PA) decrease when transitioning from adolescence into young adulthood. Evidence suggests that social support and intrapersonal factors (self-efficacy, outcome expectations, PA enjoyment) are associated with PA. The aim of the present study was to explore whether cross-sectional and longitudinal associations of social support from family and friends with leisure-time PA (LTPA) among young women living in disadvantaged areas were mediated by intrapersonal factors (PA enjoyment, outcome expectations, self-efficacy).MethodsSurvey data were collected from 18–30 year-old women living in disadvantaged suburbs of Victoria, Australia as part of the READI study in 2007–2008 (T0, N = 1197), with follow-up data collected in 2010–2011 (T1, N = 357) and 2012–2013 (T2, N = 271). A series of single-mediator models were tested using baseline (T0) and longitudinal data from all three time points with residual change scores for changes between measurements.ResultsCross-sectional analyses showed that social support was associated with LTPA both directly and indirectly, mediated by intrapersonal factors. Each intrapersonal factor explained between 5.9–37.5% of the associations. None of the intrapersonal factors were significant mediators in the longitudinal analyses.ConclusionsResults from the cross-sectional analyses suggest that the associations of social support from family and from friends with LTPA are mediated by intrapersonal factors (PA enjoyment, outcome expectations and self-efficacy). However, longitudinal analyses did not confirm these findings.
Facebook
TwitterGenetic factors contribute to the variation of bone mineral density (BMD), which is a major risk factor of osteoporosis. The aim of this study was to identify more “novel” genes for BMD. Based on the publicly available SNP-based P values, we performed an initial gene-based analysis in a total of 32,961 individuals. Furthermore, we performed differential expression, pathway and protein-protein interaction analyses to find supplementary evidence to support the significance of the identified genes. About 21,695 genes for femoral neck (FN)-BMD and 21,683 genes for lumbar spine (LS)-BMD were analyzed using gene-based association analysis. A total of 35 FN-BMD associated genes and 53 LS-BMD associated genes were identified (P < 2.3×10-6) after Bonferroni correction. Among them, 64 genes have not been reported in previous SNP-based genome-wide association studies. Differential expression analysis further supported the significant associations of 14 genes with FN-BMD and 19 genes with LS-BMD. Especially, WNT3 and WNT9B in the Wnt signaling pathway for FN-BMD were further supported by pathway analysis and protein-protein interaction analysis. The present study took the advantage of gene-based association method to perform a supplementary analysis of the GWAS dataset and found some BMD-associated genes. The evidence taken together supported the importance of Wnt signaling pathway genes in determining osteoporosis. Our findings provided more insights into the genetic basis of osteoporosis.
Facebook
TwitterAlthough there are a number of discoveries from genome-wide association studies (GWAS) for obesity, it has not been successful in linking GWAS results to biology. We sought to discover causal genes for obesity by conducting functional studies on genes detected from genetic association analysis. Gene-based association analysis of 917 individual exome sequences showed that HOGA1 attains exome-wide significance (p-value < 2.7 × 10–6) for body mass index (BMI). The mRNA expression of HOGA1 is significantly increased in human adipose tissues from obese individuals in the Genotype-Tissue Expression (GTEx) dataset, which supports the genetic association of HOGA1 with BMI. Functional analyses employing cell- and animal model-based approaches were performed to gain insights into the functional relevance of Hoga1 in obesity. Adipogenesis was retarded when Hoga1 was knocked down by siRNA treatment in a mouse 3T3-L1 cell line and a similar inhibitory effect was confirmed in mice with down-regulated Hoga1. Hoga1 antisense oligonucleotide (ASO) treatment reduced body weight, blood lipid level, blood glucose, and adipocyte size in high-fat diet-induced mice. In addition, several lipogenic genes including Srebf1, Scd1, Lp1, and Acaca were down-regulated, while lipolytic genes Cpt1l, Ppara, and Ucp1 were up-regulated. Taken together, HOGA1 is a potential causal gene for obesity as it plays a role in excess body fat development.
Facebook
TwitterRecent studies have suggested that high levels of social support can encourage better health behaviours and result in improved cardiovascular health. In this study we evaluated the association between social support and ideal cardiovascular health among urban Jamaicans. We conducted a cross-sectional study among urban residents in Jamaica’s south-east health region. Socio-demographic data and information on cigarette smoking, physical activity, dietary practices, blood pressure, body size, cholesterol, and glucose, were collected by trained personnel. The outcome variable, ideal cardiovascular health, was defined as having optimal levels of ≥5 of these characteristics (ICH-5) according to the American Heart Association definitions. Social support exposure variables included number of friends (network size), number of friends willing to provide loans (instrumental support) and number of friends providing advice (informational support). Principal component analysis was used to create a social support score using these three variables. Survey-weighted logistic regression models were used to evaluate the association between ICH-5 and social support score. Analyses included 841 participants (279 males, 562 females) with mean age of 47.6 ± 18.42 years. ICH-5 prevalence was 26.6% (95%CI 22.3, 31.0) with no significant sex difference (male 27.5%, female 25.7%). In sex-specific, multivariable logistic regression models, social support score, was inversely associated with ICH-5 among males (OR 0.67 [95%CI 0.51, 0.89], p = 0.006) but directly associated among females (OR 1.26 [95%CI 1.04, 1.53], p = 0.020) after adjusting for age and community SES. Living in poorer communities was also significantly associated with higher odds of ICH-5 among males, while living communities with high property value was associated with higher odds of ICH among females. In this study, higher level of social support was associated with better cardiovascular health among women, but poorer cardiovascular health among men in urban Jamaica. Further research should explore these associations and identify appropriate interventions to promote cardiovascular health.
Facebook
TwitterThe Michigan Public Policy Survey (MPPS) is a program of state-wide surveys of local government leaders in Michigan. The MPPS is designed to fill an important information gap in the policymaking process. While there are ongoing surveys of the business community and of the citizens of Michigan, before the MPPS there were no ongoing surveys of local government officials that were representative of all general purpose local governments in the state. Therefore, while we knew the policy priorities and views of the state's businesses and citizens, we knew very little about the views of the local officials who are so important to the economies and community life throughout Michigan. The MPPS was launched in 2009 by the Center for Local, State, and Urban Policy (CLOSUP) at the University of Michigan and is conducted in partnership with the Michigan Association of Counties, Michigan Municipal League, and Michigan Townships Association. The associations provide CLOSUP with contact information for the survey's respondents, and consult on survey topics. CLOSUP makes all decisions on survey design, data analysis, and reporting, and receives no funding support from the associations. The surveys investigate local officials' opinions and perspectives on a variety of important public policy issues and solicit factual information about their localities relevant to policymaking. Over time, the program has covered issues such as fiscal, budgetary and operational policy, fiscal health, public sector compensation, workforce development, local-state governmental relations, intergovernmental collaboration, economic development strategies and initiatives such as placemaking and economic gardening, the role of local government in environmental sustainability, energy topics such as hydraulic fracturing ("fracking") and wind power, trust in government, views on state policymaker performance, opinions on the impacts of the Federal Stimulus Program (ARRA), and more. The program will investigate many other issues relevant to local and state policy in the future. A searchable database of every question the MPPS has asked is available on CLOSUP's website. Results of MPPS surveys are currently available as reports, and via online data tables. The MPPS datasets are being released in two forms: public-use datasets and restricted-use datasets. Unlike the public-use datasets, the restricted-use datasets represent full MPPS survey waves, and include all of the survey questions from a wave. Restricted-use datasets also allow for multiple waves to be linked together for longitudinal analysis. The MPPS staff do still modify these restricted-use datasets to remove jurisdiction and respondent identifiers and to recode other variables in order to protect confidentiality. However, it is theoretically possible that a researcher might be able, in some rare cases, to use enough variables from a full dataset to identify a unique jurisdiction, so access to these datasets is restricted and approved on a case-by-case basis. CLOSUP encourages researchers interested in the MPPS to review the codebooks included in this data collection to see the full list of variables including those not found in the public-use datasets, and to explore the MPPS data using the public-use datasets. On 2016-08-20, the openICPSR web site was moved to new software. In the migration process, some projects were not published in the new system because the decisions made in the old site did not map easily to the new setup. This project is temporarily available as restricted data while ICPSR verifies that all files were migrated correctly.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...