Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accuracy and AUC value of ML algorithms using three hyper parameter tuning techniques.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is specially curated for Association Rule Learning using **Apriori and Eclat **using Python to predict Shopping Behavior.
Apriori is one of the powerful algorithms to understand association among the products. Take an example of a supermarket where most of the person buys egg also buys milk and also baking soda. Probably the reason is they want to bake a cake for new year's eve.
So we can see there is an association between eggs, milk as well as baking soda. Now after knowing such association we simply put all the 3 things together in the shelf and that definitely will increase our sales.
Let’s perform Apriori with the help of an example.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ranchantan
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Socio-demographic characteristics among adolescent girls in Ethiopia, 2016 EDHS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project contains the dataset used to generate the results of the study "Neural embedding of beliefs reveals the role of relative dissonance in human decision-making" (arXiv:2408.07237).Authors: Byunghwee Lee1, Rachith Aiyappa1, Yong-Yeol Ahn1, Haewoon Kwak1, Jisun An11 Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, USA, 47408DDO_dataset.zip (original Debate.org dataset)This archive contains the original raw Debate.org dataset, which was obtained from the publicly accessible website (https://esdurmus.github.io/ddo.html), maintained by Esin Durmus [1,2]. All credit for this dataset belongs entirely to the original authors, Esin Durmus and Claire Cardie. We do not claim any authorship or modifications to this dataset. It is provided here solely for reproducibility and reference in our study.The dataset includes the following three files:- debates.json: This JSON file contains a Python dictionary that assigns a debate name --- a unique name for each debate --- to debate information- users.json: This JSON file includes a Python dictionary containing user information- readme.md file from the authors (Esin Durmus and Claire Cardie)When using this dataset, please reference Debate.org and cite the following works:[1] Esin Durmus and Claire Cardie. 2019. A Corpus for Modeling User and Language Effects in Argumentation on Online Debating. In Proceedings of the 57th Conference of the Association for Computational Linguistics. Florence, Italy. Association for Computational Linguistics.[2] Esin Durmus and Claire Cardie. 2018. Exploring the Role of Prior Beliefs for Argument Persuasion. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL).df_ddo_including_only_truebeliefs_nodup(N192307).pThis file contains a pre-processed dataset used in our project (arXiv:2408.07237). The dataset includes records of user participation in debates (both as debaters and voters) as well as voting records across various debates. The belief triplet dataset used for fine-tuning a Sentence-BERT model was generated based on this pre-processed dataset. Detailed explanations of the pre-processing procedure are provided in the Methods section of the paper.When using this pre-processed dataset, please cite the following reference (in addition to the two papers mentioned above):[3] Lee, B., Aiyappa, R., Ahn, Y. Y., Kwak, H., & An, J. (2024). Neural embedding of beliefs reveals the role of relative dissonance in human decision-making. arXiv preprint arXiv:2408.07237.model_full_data.zipThis zip file contains five fine-tuned S-BERT models trained using a 5-fold belief triplet dataset. After unzipping the files, users can import the models using the 'sentence_transformers' Python library (https://sbert.net/).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection crashes recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in at least 5% of the total "high-high" cluster road intersection crashes for the years 2017, 2018, 2019, and 2021. The dataset is meticulously organised according to support metric values, ranging from 0,05 to 0,0235, with entries presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 499 KBNumber of Files: The dataset contains a total of 7186 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,05 support metric value. Consequently, commonly occurring crash attributes among at least 5% of the "high-high" cluster road intersection crashes were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Facebook
Twitterhttp://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
This repository makes available the source code and public dataset for the work, "DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning", published with open access by Scientific Reports: https://www.nature.com/articles/s41598-018-38343-3. The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora. In our work, the dataset was classified to an average accuracy of 95.7% with the ResNet50 deep convolutional neural network.
The source code, images and annotations are licensed under CC BY 4.0 license. The contents of this repository are released under an Apache 2 license.
Download the dataset images and our trained models
images.zip (468 MB)
models.zip (477 MB)
Due to the size of the images and models they are hosted outside of the Github repository. The images and models must be downloaded into directories named "images" and "models", respectively, at the root of the repository. If you execute the python script (deepweeds.py), as instructed below, this step will be performed for you automatically.
TensorFlow Datasets
Alternatively, you can access the DeepWeeds dataset with TensorFlow Datasets, TensorFlow's official collection of ready-to-use datasets. DeepWeeds was officially added to the TensorFlow Datasets catalog in August 2019.
Weeds and locations
The selected weed species are local to pastoral grasslands across the state of Queensland. They include: "Chinee apple", "Snake weed", "Lantana", "Prickly acacia", "Siam weed", "Parthenium", "Rubber vine" and "Parkinsonia". The images were collected from weed infestations at the following sites across Queensland: "Black River", "Charters Towers", "Cluden", "Douglas", "Hervey Range", "Kelso", "McKinlay" and "Paluma". The table and figure below break down the dataset by weed, location and geographical distribution.
Data organization
Images are assigned unique filenames that include the date/time the image was photographed and an ID number for the instrument which produced the image. The format is like so: YYYYMMDD-HHMMSS-ID, where the ID is simply an integer from 0 to 3. The unique filenames are strings of 17 characters, such as 20170320-093423-1.
labels
The labels.csv file assigns species labels to each image. It is a comma separated text file in the format:
Filename,Label,Species ... 20170207-154924-0,jpg,7,Snake weed 20170610-123859-1.jpg,1,Lantana 20180119-105722-1.jpg,8,Negative ...
Note: The specific label subsets of training (60%), validation (20%) and testing (20%) for the five-fold cross validation used in the paper are also provided here as CSV files in the same format as "labels.csv".
models
We provide the most successful ResNet50 and InceptionV3 models saved in Keras' hdf5 model format. The ResNet50 model, which provided the best results, has also been converted to UFF format in order to construct a TensorRT inference engine.
resnet.hdf5 inception.hdf5 resnet.uff
deepweeds.py
This python script trains and evaluates Keras' base implementation of ResNet50 and InceptionV3 on the DeepWeeds dataset, pre-trained with ImageNet weights. The performance of the networks are cross validated for 5 folds. The final classification accuracy is taken to be the average across the five folds. Similarly, the final confusion matrix from the associated paper aggregates across the five independent folds. The script also provides the ability to measure the inference speeds within the TensorFlow environment.
The script can be executed to carry out these computations using the following commands.
To train and evaluate the ResNet50 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model resnet.
To train and evaluate the InceptionV3 model with five-fold cross validation, use python3 deepweeds.py cross_validate --model inception.
To measure inference times for the ResNet50 model, use python3 deepweeds.py inference --model models/resnet.hdf5.
To measure inference times for the InceptionV3 model, use python3 deepweeds.py inference --model models/inception.hdf5.
Dependencies
The required Python packages to execute deepweeds.py are listed in requirements.txt.
tensorrt
This folder includes C++ source code for creating and executing a ResNet50 TensorRT inference engine on an NVIDIA Jetson TX2 platform. To build and run on your Jetson TX2, execute the following commands:
cd tensorrt/src make -j4 cd ../bin ./resnet_inference
Citations
If you use the DeepWeeds dataset in your work, please cite it as:
IEEE style citation: “A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney, B. Calvert, M. Rahimi Azghadi, and R. D. White, “DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning,” Scientific Reports, vol. 9, no. 2058, 2 2019. [Online]. Available: https://doi.org/10.1038/s41598-018-38343-3 ”
BibTeX
@article{DeepWeeds2019, author = {Alex Olsen and Dmitry A. Konovalov and Bronson Philippa and Peter Ridd and Jake C. Wood and Jamie Johns and Wesley Banks and Benjamin Girgenti and Owen Kenny and James Whinney and Brendan Calvert and Mostafa {Rahimi Azghadi} and Ronald D. White}, title = {{DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning}}, journal = {Scientific Reports}, year = 2019, number = 2058, month = 2, volume = 9, issue = 1, day = 14, url = "https://doi.org/10.1038/s41598-018-38343-3", doi = "10.1038/s41598-018-38343-3" }
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides comprehensive information on road intersection crashes involving motorcycles (Motor tricycle, Motorcycle: under 125cc, Motorcycle: Above 125cc, Quadru-cycle) that have resulted in injuries recognised as "high-high" clusters within the City of Cape Town. It includes detailed records of all intersection crashes and their corresponding crash attribute combinations, which were prevalent in 33% of the total "high-high" cluster motorcycle road intersection crashes resulting in injuries for the years 2017, 2018 and 2019. The dataset is meticulously organised according to confidence metric values presented in descending order.Data SpecificsData Type: Geospatial-temporal categorical dataFile Format: Excel document (.xlsx)Size: 29,8 KBNumber of Files: The dataset contains a total of 576 association rulesDate Created: 23rd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, PythonProcessing Steps: Following the spatio-temporal analyses and the derivation of "high-high" cluster fishnet grid cells from a cluster and outlier analysis, all the road intersection crashes involving a motorcycle resulting in injuries that occurred within the "high-high" cluster fishnet grid cells were extracted to be processed by association analysis. The association analysis of these crashes was processed using Python software and involved the use of a 0,30 support metric value. Consequently, commonly occurring crash attributes among at least 33% of the "high-high" cluster road intersection motorcycle crashes resulting in injuries were extracted for inclusion in this dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
Facebook
TwitterThis dataset includes data that is provided in the Udemy course "Data Analysis with Pandas and Python" by Boris Paskhaver.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 100 fictional café records generated using the Python Faker library. Each record includes the café’s name, city, average customer rating, price range, and specialty.
Purpose: Designed for learning data analysis, visualization, and basic machine learning.
Source: Synthetic (no real-world data used).
Update Frequency: Static (one-time release).
License: CC0: Public Domain.
Columns:
Cafe_Name: Fictional café name
City: Random city name
Rating: Customer rating between 3.0–5.0
Price_for_Two: Average price for two customers
Specialty: Type of coffee specialty
Opening_Hours: Typical operating hours
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a sample Dataset used for the Tutorial Notebook Titled:
Follow the Notebook here: https://www.kaggle.com/code/aryashah2k/mistakes-to-avoid-in-data-science-python
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset for vehicle data for a Vehicular Ad Hoc Network (VANET) environment, focusing on real-time communication and lightweight cryptographic applications. It includes 500 vehicles with associated parameters such as speed (up to 250 km/h), GPS location, fixed message size (30 KB), frequency of communication, and threat level as the target column.
Facebook
TwitterThis was a code along project done with Datalab's course resources with instructors Data Evangelist Richie Cotton and Senior Data Science Content Developer Maham Khan.
Code was done in Datacamp's Datalab Workbook
The purpose of this code along project was to:
Learn how to apply a web scraper to create a corpus of freelancer reviews
Learn how to label the reviews as masculine or feminine based on pronouns
Identify language (words and phrases) used more often for freelancers with male pronounces versus female pronouns, and study how this language varies by field
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for research and development in optimizing resource allocation and joint transmission for intelligent classrooms using 6G network slicing technology.
Features: Classroom_Type: The type of classroom (Virtual, Hybrid, In-Person, Large Lecture). Number_of_Devices: The number of connected devices in the classroom. Bandwidth (Mbps): Network bandwidth available for transmission. Latency (ms): Delay in data transmission. Throughput (Mbps): The effective rate of successful data delivery. Signal_Strength (dBm): Signal quality in decibels. CPU_Usage (%): Processor usage during operation. Memory_Usage (%): Memory consumption during operation. Performance: Target column indicating high (1) or low (0) performance based on network and classroom parameters.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Subho117
Released under MIT
Facebook
TwitterAs part of the High School Project within Manchester Metropolitan University, students are encouraged to explore the field of data visualization using Python. This project aims to introduce students to the fundamental concepts of data visualization and provide them with practical experience in utilizing Python programming language for visualizing datasets.
Once you have analysed it, try answering these questions:
Q1: Total Customers from the dataset? Q2: Total Females in the dataset? Q3: Total Males in the dataset? Q4: What is the Min Rating? Q5: What is the Max Rating? Q6: What is the Average Rating? Q7: Which product line has the highest rating? Q8: Do Q7 for maximum and minimum rating as well. Q9: Which product line has the highest sales? Q10: Do Q9 for maximum and minimum rating as well.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Fuzzy based Smart Manufacturing Dataset contains 1,000 samples of sensor readings and process parameters collected for optimizing industrial process control using a Fuzzy-PID Controller. This dataset is designed for research and development in smart manufacturing, adaptive control, and Industry 4.0 applications.
Features: Temperature_C (°C): Temperature readings from industrial sensors. Pressure_Bar (Bar): Pressure levels in the manufacturing process. Speed_RPM (RPM): Rotational speed of motors or actuators. Error: Deviation between setpoint and measured value. Delta_Error: Rate of change of the error signal. Load_Variation: Load factor indicating dynamic variations (0.8 - 1.2). Ambient_Temp_C (°C): External environmental temperature. Energy_Consumption_W (W): Power usage in watts. Use Cases: Tuning and optimizing Fuzzy-PID controllers Energy-efficient control system design Fault detection and predictive maintenance Industrial automation and adaptive process control This dataset is particularly useful for machine learning, optimization algorithms, and MATLAB/Python simulations in smart manufacturing environments. 🚀
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Indian Premier League Popularly known as IPL is one of the most popular league in the world. Every year it is spectated by billions of cricket fans from around the world. The league has Indian and foreign players where newbies get their first taste of what it's like to play against international player. Due to it's popularity it attracts many bigshot companies and business men to make an investment in the playing teams. The team names are associated with regions and the league is usually played around India in various stadiums
The following data sets consists of the IPL data from the year 2008-2019 .A total of 764 matches have been played A point to note is that few of the teams have either dropped out or changed their team name over the years .So it's important to do some fact check | Columns | Description | | Team1 | Team #1 playing the match| | Team2 | Team #2 playing the match | | Date | The day on which the match was played | | Year | The year the match was played | | Time | The matches are usually played in 2 slots, afternoon and evening this gives the time when the match was started | | Place | Contains the city and the stadium the match was played in | | Toss | Contains name of the team that won the toss | | TossDecision | Gives details on what was the decision of the team winning the toss | | Result | Contains result of the match| |Tied | Contains information of the tie| | won**_**runs | Contains information about the winning team that batted first | | won**_**wickets | Contains information about the winning team that bowled first |
The dataset was web scraped from cricbuzz.com using python.The website contains detailed information of all the cricket matches
Things to explore 1. Common traits in the data 2. Which team won the most number of matches? 3. Which team played most number of games? 4. How does winning the toss affect the game result? 5. Use the data to predict future ipl matches
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...