69 datasets found

d
Data Mining in Systems Health Management
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Data from: Method to assess the functional role of noisy brain signals by...
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Meinel; Henrich Kolkhorst; Michael Tangermann; Andreas Meinel; Henrich Kolkhorst; Michael Tangermann (2020). Method to assess the functional role of noisy brain signals by mining envelope dynamics [Dataset]. http://doi.org/10.5281/zenodo.1237814
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1237814
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andreas Meinel; Henrich Kolkhorst; Michael Tangermann; Andreas Meinel; Henrich Kolkhorst; Michael Tangermann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preprocessed envelope EEG features based on a spatial filter approach. The features were computed across multiple within-trial SVIPT events for a large hyperparameter space on data of an exemplary subject.

The file "components.bsv" contains the preprocessed envelope features of all investigated configurations and provides underlying parameters as well as a relative path for the key ``record_dir'' to additional component information. Specifically, for each configuration the spatial filter, spatial activity pattern and the time-resolved within-trial envelope signal is provided under "records/".
Data Mining in Systems Health Management - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Data Mining in Systems Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
q
Simulated supermarket transaction data
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated May 31, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuefeng Li (2010). Simulated supermarket transaction data [Dataset]. https://researchdatafinder.qut.edu.au/individual/q44
Explore at:
Dataset updated
May 31, 2010
Dataset provided by
Queensland University of Technology (QUT)
Authors
Yuefeng Li
Description
A database of de-identified supermarket customer transactions. This large simulated dataset was created based on a real data sample.
Data from: STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC...
scielo.figshare.com
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch (2023). STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC INFORMATION THROUGH DATA MINING [Dataset]. http://doi.org/10.6084/m9.figshare.8031641.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8031641.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Paula Fernandez Costa; Irving da Silva Badolato; Rogério Luís Ribeiro Borba; Julia Celia Mercedes Strauch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This aim of this paper is the acquisition of geographic data from the Foursquare application, using data mining to perform exploratory and spatial analyses of the distribution of tourist attraction and their density distribution in Rio de Janeiro city. Thus, in accordance with the Extraction, Transformation, and Load methodology, three research algorithms were developed using a tree hierarchical structure to collect information for the categories of Museums, Monuments and Landmarks, Historic Sites, Scenic Lookouts, and Trails, in the foursquare database. Quantitative analysis was performed of check-ins per neighborhood of Rio de Janeiro city, and kernel density (hot spot) maps were generated The results presented in this paper show the need for the data filtering process - less than 50% of the mined data were used, and a large part of the density of the Museums, Historic Sites, and Monuments and Landmarks categories is in the center of the city; while the Scenic Lookouts and Trails categories predominate in the south zone. This kind of analysis was shown to be a tool to support the city's tourist management in relation to the spatial localization of these categories, the tourists’ evaluations of the places, and the frequency of the target public.
r
Data from: Scaling data mining in massively parallel dataflow systems
resodate.org
Updated Feb 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Schelter (2016). Scaling data mining in massively parallel dataflow systems [Dataset]. http://doi.org/10.14279/depositonce-4982
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-4982
Dataset updated
Feb 5, 2016
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Sebastian Schelter
Description
This thesis lays the ground work for enabling scalable data mining in massively parallel dataflow systems, using large datasets. Such datasets have become ubiquitous. We illustrate common fallacies with respect to scalable data mining: It is in no way sufficient to naively implement textbook algorithms on parallel systems; bottlenecks on all layers of the stack prevent the scalability of such naive implementations. We argue that scalability in data mining is a multi-leveled problem and must therefore be approached on the interplay of algorithms, systems, and applications. We therefore discuss a selection of scalability problems on these different levels. We investigate algorithm-specific scalability aspects of collaborative filtering algorithms for computing recommendations, a popular data mining use case with many industry deployments. We show how to efficiently execute the two most common approaches, namely neighborhood methods and latent factor models on MapReduce, and describe a specialized architecture for scaling collaborative filtering to extremely large datasets which we implemented at Twitter. We turn to system-specific scalability aspects, where we improve system performance during the distributed execution of a special class of iterative algorithms by drastically reducing the overhead required for guaranteeing fault tolerance. Therefore we propose a novel optimistic approach to fault-tolerance which exploits the robust convergence properties of a large class of fixpoint algorithms and does not incur measurable overhead in failure-free cases. Finally, we present work on an application-specific scalability aspect of scalable data mining. A common problem when deploying machine learning applications in real-world scenarios is that the prediction quality of ML models heavily depends on hyperparameters that have to be chosen in advance. We propose an algorithmic framework for an important subproblem occuring during hyperparameter search at scale: efficiently generating samples from block-partitioned matrices in a shared-nothing environment. For every selected problem, we show how to execute the resulting computation automatically in a parallel and scalable manner, and evaluate our proposed solution on large datasets with billions of datapoints.
Data Mining Applied to Life Cycle Inventory Modeling for Cumene and Sodium...
catalog.data.gov
gimi9.com
Updated Mar 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Data Mining Applied to Life Cycle Inventory Modeling for Cumene and Sodium Hydroxide Manufacturing, Version 1, 09/2018 [Dataset]. https://catalog.data.gov/dataset/data-mining-applied-to-life-cycle-inventory-modeling-for-cumene-and-sodium-hydroxide-ma-09
Explore at:
Dataset updated
Mar 4, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
This file contains the life cycle inventories (LCIs) developed for an associated journal article. Potential users of the data are referred to the journal article for a full description of the modeling methodology. LCIs were developed for cumene and sodium hydroxide manufacturing using data mining with metadata-based data preprocessing. The inventory data were collected from US EPA's 2012 Chemical Data Reporting database, 2011 National Emissions Inventory, 2011 Toxics Release Inventory, 2011 Electronic Greenhouse Gas Reporting Tool, 2011 Discharge Monitoring Report, and the 2011 Biennial Report generated from the RCRAinfo hazardous waste tracking system. The U.S. average cumene gate-to-gate inventories are provided without (baseline) and with process allocation applied using metadata-based filtering. In 2011, there were 8 facilities reporting public production volumes of cumene in the U.S., totaling to 2,609,309,687 kilograms of cumene produced that year. The U.S. average sodium hydroxide gate-to-gate inventories are also provided without (baseline) and with process allocation applied using metadata-based filtering. In 2011, there were 24 facilities reporting public production volumes of sodium hydroxide in the U.S., totaling to 3,878,021,614 kilograms of sodium hydroxide produced that year. Process allocation was only conducted for the top 12 facilities producing sodium hydroxide, which represents 97% of the public production of sodium hydroxide. The data have not been compiled in the formal Federal Commons LCI Template to avoid users interpreting the template to mean the data have been fully reviewed according to LCA standards and can be directly applied to all types of assessments and decision needs without additional review by industry and potential stakeholders. This dataset is associated with the following publication: Meyer, D.E., S. Cashman, and A. Gaglione. Improving the reliability of chemical manufacturing life cycle inventory constructed using secondary data. JOURNAL OF INDUSTRIAL ECOLOGY. Berkeley Electronic Press, Berkeley, CA, USA, 25(1): 20-35, (2021).
Z
Data associated with "A collaborative filtering based approach to biomedical...
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lever, Jake (2020). Data associated with "A collaborative filtering based approach to biomedical knowledge discovery" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1227312
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
University of British Columbia
Authors
Lever, Jake
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data set associated with the publication: "A collaborative filtering based approach to biomedical knowledge discovery" published in Bioinformatics.

The data are sets of cooccurrences of biomedical terms extracted from published abstracts and full text articles. The cooccurrences are then represented in sparse matrix form. There are three different splits of this data denoted by the prefix number on the files.

All - All cooccurrences combined in a single file

Training/Validation - All cooccurrences in publications before 2010 in training, all novel cooccurrences in publication in 2010 go in validation

Training+Validation/Test - All cooccurrences in publication upto and including 2010 in training+validation. All novel cooccurrences after 2010 in year by year increments and also all combined together

Furthermore there are subset files which are used in some experiments to deal with the computational cost of evaluating the full set. The associated cuids.txt file containing a link between the row/column in the matrix with the UMLS Metathesaurus CUIDs. Hence the first row of cuids.txt matches up to the 0th row/column in the matrix. Note that the matrix is square and symmetric. This work was done with UMLS Metathesaurus 2016AB.
The running time(s) of five feature filtering methods on two groups cancer...
plos.figshare.com
xls
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiquan Sun; Qinke Peng; Adnan Shakoor (2023). The running time(s) of five feature filtering methods on two groups cancer classification datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0102541.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0102541.t010
Dataset updated
Jun 15, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Shiquan Sun; Qinke Peng; Adnan Shakoor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1Time required for selecting 1000 features.
n
Data from: Mining the first 100 days: Human and data ethics in Twitter...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Aug 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Teresa Neely (2021). Mining the first 100 days: Human and data ethics in Twitter research [Dataset]. http://doi.org/10.5061/dryad.d2547d83h
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d2547d83h
Dataset updated
Aug 9, 2021
Dataset provided by
University of New Mexico
Authors
Jonathan Wheeler; Teresa Neely
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset consists of tweet identifiers for tweets harvested between November 28, 2016, following the election of Donald Trump through the end of the first 100 days of his administration. Data collection ended May 1, 2017.

Tweets were harvested using multiple methods described below. The total dataset consists of 218,273,152 tweets. Because of the different methods used to harvest tweets, there may be some duplication.

Methods Data were harvested from the Twitter API using the following endpoints:

search timeline filter

Three tweet sets were harvested using the search endpoint, which returns tweets that include a specific search term, user mention, hashtag, etc. The table below provides the search term, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented.

Search term Dates collected Count tweets Count unique users @realDonaldTrump user mention 2016-11-28 - 2017-05-01 4,597,326 1,501,806 "Trump" in tweet text 2017-01-18 - 2017-05-01 11,055,772 2,648,849 #MAGA hashtag 2017-01-23 - 2017-05-01 1,169,897 236,033

Two tweet sets were harvested using the timeline endpoint, which returns tweets published by specific users. The table below provides the user whose timeline was harvested, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented. Note that in these cases, tweets were necessarily limited to the one unique user whose tweets were harvested.

User Dates collected Count tweets Count unique users realDonaldTrump 2016-12-21 - 2017-05-01 902 1 trumpRegrets 2017-01-15 - 2017-05-01 1,751 1

The largest tweet set was harvested using the filter endpoint, which allows for streaming data access in near real time. Requests made to this API can be filtered to include tweets that meet specific criteria. The table below provides the filters used, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented.

Filtering via the API uses a default "OR," so the tweets included in this set satisfied any of the filter terms.

The script used to harvest streaming data from the filter API was built using the Python tweepy library.

Filter terms Dates collected Count tweets Count unique users tweets by realDonaldTrump tweet mentions @realDonaldTrump 'maga' in text 'trump' in text 'potus' in text 2017-01-26 - 2017-05-01 201,447,504 12,489,255

Harvested tweets, including all corresponding metadata, were stored in individual JSON files (one file per tweet).

Data Processing: Conversion to CSV format

Per the terms of Twitter's developer API, tweet datasets may be shared for academic research use. Sharing tweet data is limited to sharing the identifiers of tweets, which must be re-harvested to account for deletions and/or modifications of individual tweets. It is not permitted to share the originally harvested tweets in JSON format.

Tweet identifiers have been extracted from the JSON data and saved as plain text CSV files. The CSV files all have a single column:

id_str (string): A tweet identifier

The data include one tweet identifier per row.
Z
Mapping User Attention: Filtering and Visualizing Relevant UI Components in...
data.niaid.nih.gov
nde-dev.biothings.io
+1more
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martínez-Rojas, Antonio; Reijers, Hajo A.; Jiménez Ramírez, Andrés; González Enríquez, José (2024). Mapping User Attention: Filtering and Visualizing Relevant UI Components in Screenshots based on Gaze Fixations [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8009444
Explore at:
Dataset updated
Sep 9, 2024
Dataset provided by
University of Seville
University of Utrecht
Authors
Martínez-Rojas, Antonio; Reijers, Hajo A.; Jiménez Ramírez, Andrés; González Enríquez, José
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data correspond to the set of problems used for the evaluation of the proposal What Are You Gazing At? An Approach to Use Eye-tracking for Robotic Process Automation.

Each problem consists of a set of 10 screenshots with the same look and feel but different data values for those values that can be entered/modify by the user. Each problem has its associated gaze fixation data. In each of the problems there is a key UI element that primarily attracts the attention of the user.

The evaluation is based on a set of images which resemble realistic screenshots of activities in the administrative domain. More precisely, 5 different set of screenshots (S) are generated, each of them with a different level of complexity. Complexity is measured in terms of the number of UI elements per screenshot. The sets are:

S1 Mockup-based email view. Represents the activity of viewing an email to check if it contains an attachment. In this case, the key UI element that receives the attention is the attachment inside the email.

S2 Mockup-based CRM user details. Represents a user's detail viewing activity within a Client Relationship Management (CRM) platform. The key UI element is the checkbox that indicates if the user has all his invoices paid.

S3 Real screenshot email view. Analogous to S1 but with real screenshots. It represents the activity of viewing an e-mail to check if it contains an attachment. In this case, the key UI element to which attention is paid is the attachment contained in the e-mail.

S4 Real screenshot CRM user details. Analogous to S2 but with real screenshots. It represents a user's detail viewing activity within a CRM platform. The key UI element is the checkbox indicating whether the user has all their invoices paid.

S5 Real screenshot CRM user details. Represents the split-screen display of two applications. On the left side a pdf viewer, showing a covid vaccination certificate. And on the right side a human resources management system (basic recreation of real system for privacy reasons). In this one the detail of the employee to whom the certificate of the left side corresponds is visualized. These screenshots, having two applications, have two key UI elements. In the pdf viewer it is the name of the certificate holder and in the human resources management system it is the name of the employee whose detail view is being displayed. The activity being carried out is the verification that the covid certificate received corresponds to that of an employee.

Two types of filters based on the gaze fixation data are applied to these sets of screenshots: Pre-filtering and Post-filtering, corresponding to applying the filtering before and after detecting UI components in the screenshots, respectively. The structure of the data packages is divided in two folders input and output. The input folder is organized as follows:

input/

screenshots/: corresponds to the screenshots. The sets of screenshots are easily identifiable, they are named following the pattern: SX_screenshot_DDDD.jpeg. Where X indicates to which of the set of screenshots described in the previous list it belongs, and DDDD represents a unique identifier for each screenshot. Each group consists of 10 screenshots, being 50 in total.

fixation.json: It is a JSON file that contains a key associated with each of the screenshots. For each screenshot, it contains a "fixation_points" key where information about the fixations that have occurred on the screenshot is stored. Here's an example:

"S5_screenshot_0050.jpeg": { "fixation_points": { "334.25#497.166666666667": { "#events": 6, "start_index": 33224, "ms_start": 553962.1467, "ms_end": 554061.9899, "duration": 99.8432000001194, "imotions_dispersion": 0.300325967868111, "last_index": 33229, "dispersion": 14.044275227531914 }, "1258.80769230769#507.576923076923": { "#events": 13, "start_index": 33234, "ms_start": 554128.5427, "ms_end": 554345.3595, ...

The output folder is organized in three subfolders, the first one containing the information of the non-filtered screenshots (i.e. without having applied to them any filtering or processing), and the next two with the information resulting from pre-filtering and post-filtering.

output/

non-filter/

borders/: screenshots with highlighted borders of all UI components detected in it.

components_json/: a collection of JSON files with the same name as the screenshot, containing the "img_shape" key with a list of the screen resolution and the number of layers the image has: [1080, 1920, 3], and the "compos" key with a list of all UI components representing the Screen Object Model.

pre-filter/ and post-filter/

borders/: screenshots with the borders of the relevant UI components. In the case of prefiltering, the detection of components is only performed on the parts of the screenshot that have received attention. In postfiltering, the complete screenshot is shown, with only the borders of the relevant UI components highlighted.

components_json/: a collection of JSON files with the same name as the screenshot is included, containing the following keys:

"img_shape": A list representing the screen resolution and the number of layers in the image, e.g., [1080, 1920, 3].

"compos": A list of all UI components representing the Screen Object Model (SOM). During post-filtering, each UI component is augmented with an additional property called "relevant." If this property is set to true, it indicates that the respective UI component has received attention.

(pre)/(post)filter_attention_maps/: represent the attention maps. In the case of prefiltering, any surface of the screen that has not received attention will be shown in black. In the case of postfiltering, the areas of attention will be shown as red circles, and the UI components whose area intersects with the areas of attention by more than 25% will be shown in yellow.

In conclusion, the described data package consists of sets of screenshots, accompanied by prefiltering and postfiltering filters using gaze fixation data, enabling the identification of relevant UI components. The organized data packages include input and output folders, where the output folder offers processed screenshots, UI component information, and attention maps. This resource provides valuable insights into user attention and interaction with UI elements on different types of scenarios.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Z
Data from: How are software repositories mined? A systematic literature...
data.niaid.nih.gov
Updated Sep 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymized for Review (2021). How are software repositories mined? A systematic literature review of workflows, methodologies, reproducibility, and tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5274207
Explore at:
Dataset updated
Sep 2, 2021
Dataset provided by
Anonymized
Authors
Anonymized for Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the excel spreadsheet dataset containing our analysis of papers performing mining software repositories research from the conferences ICSE, ESEC/FSE, and MSR from the years 2018 - 2020. The data is broken into columns and can be explained at a high-level as follows:

Column Content

1 The paper being analyzed

2 Does the paper state the data they analyzed is available

3 Does the paper perform some sort of data analysis or sampling using data others have compiled in the past

4 Does the paper state a timestamp for when they begin their work

5 Does the paper state the use of systems pre-built to help with MSR work

6 - 18 Forms of sampling researchers may have employed to select their data

19 What datasets (if any) were used in the analysis

20 What tools (if any) were used in the analysis

21 How they performed their data sampling workflow

22 How they performed their data filtering workflow

23 How they performed their data retrieval workflow

24 Did they create any scripts in each of these workflows

25 - 33 Did they publish a replication package and what is contained within

34 Is the paper describing a tool for research or not

35 Short description of the paper read

36 A high-level category of the work performed in each paper
f
Comparison between only using the weight function and using both the...
plos.figshare.com
xls
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jain-Shing Wu; E-Fong Kao; Chung-Nan Lee (2023). Comparison between only using the weight function and using both the negative-term filtering scheme and the weight function of three scenarios. [Dataset]. http://doi.org/10.1371/journal.pone.0098826.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0098826.t002
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Jain-Shing Wu; E-Fong Kao; Chung-Nan Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison between only using the weight function and using both the negative-term filtering scheme and the weight function of three scenarios.
MS2NMF Data Release
zenodo.org
zip
Updated Sep 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CHENLAN SHU; CHENLAN SHU (2025). MS2NMF Data Release [Dataset]. http://doi.org/10.5281/zenodo.17181796
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17181796
Dataset updated
Sep 23, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
CHENLAN SHU; CHENLAN SHU
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the study of MS2NMF, a structure-sensitive workflow for deep mining of LC–MS/MS data.

Contents:
- raw/: Original LC–MS/MS raw data and converted open formats (.raw, .mzML, .mgf, .xls, .graphml)
- processed/: Intermediate results and matrices generated during MS2NMF processing
- figure_source_data/: Source data files for reproducing Figures 2–4
- GLOBAL_METADATA/: Experimental procedures, plant material, extraction, LC–MS/MS acquisition, computational workflow, and validation metadata
- README.txt, LICENSE.txt, CITATION.txt: Documentation, license, and citation information

The dataset includes raw LC–MS/MS data (Orbitrap), processed feature tables, optimized fragment matrices, and figure-specific source data. Together, these resources allow full reproduction of the MS2NMF workflow, including precursor-level filtering, matrix optimization, NMF decomposition, and integration with database annotations and spectral similarity.

For usage, please refer to the included README.txt.
License: CC-BY 4.0.
Optimal Alarm Systems - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Optimal Alarm Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/optimal-alarm-systems
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
An optimal alarm system is simply an optimal level-crossing predictor that can be designed to elicit the fewest false alarms for a fixed detection probability. It currently use Kalman filtering for dynamic systems to provide a layer of predictive capability for the forecasting of adverse events. Predicted Kalman filter future process values and a fixed critical threshold can be used to construct a candidate level-crossing event over a predetermined prediction window. Due to the fact that the alarm regions for an optimal level-crossing predictor cannot be expressed in closed form, one of our aims has been to investigate approximations for the design of an optimal alarm system. Approximations to this sort of alarm region are required for the most computationally efficient generation of a ROC curve or other similar alarm system design metrics. Algorithms based upon the optimal alarm system concept also require models that appeal to a variety of data mining and machine learning techniques. As such, we have investigated a serial architecture which was used to preprocess a full feature space by using SVR (Support Vector Regression), implicitly reducing it to a univariate signal while retaining salient dynamic characteristics (see AIAA attachment below). This step was required due to current technical constraints, and is performed by using the residual generated by SVR (or potentially any regression algorithm) that has properties which are favorable for use as training data to learn the parameters of a linear dynamical system. Future development will lift these restrictions so as to allow for exposure to a broader class of models such as a switched multi-input/output linear dynamical system in isolation based upon heterogeneous (both discrete and continuous) data, obviating the need for the use of a preprocessing regression algorithm in serial. However, the use of a preprocessing multi-input/output nonlinear regression algorithm in serial with a multi-input/output linear dynamical system will allow for the characterization of underlying static nonlinearities to be investigated as well. We will even investigate the use of non-parametric methods such as Gaussian process regression and particle filtering in isolation to lift the linear and Gaussian assumptions which may be invalid for many applications. Future work will also involve improvement of approximations inherent in use of the optimal alarm system of optimal level-crossing predictor. We will also perform more rigorous testing and validation of the alarm systems discussed by using standard machine learning techniques and consider more complex, yet practically meaningful critical level-crossing events. Finally, a more detailed investigation of model fidelity with respect to available data and metrics has been conducted (see attachment below). As such, future work on modeling will involve the investigation of necessary improvements in initialization techniques and data transformations for a more feasible fit to the assumed model structure. Additionally, we will explore the integration of physics-based and data-driven methods in a Bayesian context, by using a more informative prior.
Comprehensive Medical Q&A Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
Explore at:
zip(5126941 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

By Huggingface Hub [source]

About this dataset

The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

Research Ideas

Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.

Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.

Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
Recommendation Engine Market Analysis North America, Europe, APAC, South...
technavio.com
pdf
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Recommendation Engine Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, India, Japan, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/recommendation-engine-market-size-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Apr 2, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Area covered
United States, Germany
Description
Snapshot img

Recommendation Engine Market Size 2024-2028

The recommendation engine market size is forecast to increase by USD 1.66 billion, at a CAGR of 39.91% between 2023 and 2028.

The market is experiencing significant growth, driven by the increasing digitalization of various industries and the rising demand for personalized recommendations. As businesses strive to enhance customer experience and engagement, recommendation engines have become essential tools for delivering tailored product or content suggestions. However, this market is not without challenges. One of the most pressing issues is ensuring accuracy in data prediction. With the vast amounts of data being generated daily, the ability to analyze and make accurate predictions is crucial for the success of recommendation engines. This requires advanced algorithms and machine learning capabilities to effectively understand user behavior and preferences. Companies seeking to capitalize on this market's opportunities must invest in developing sophisticated recommendation engines that can navigate the complexities of data analysis and prediction, while also addressing the challenges related to data accuracy. By doing so, they will be well-positioned to meet the growing demand for personalized recommendations and stay competitive in the digital landscape.

What will be the Size of the Recommendation Engine Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free SampleThe market continues to evolve, driven by advancements in big data, machine learning, and artificial intelligence. These technologies enable the development of more sophisticated recommendation systems, which are finding applications across various sectors. Model evaluation and cloud computing play a crucial role in ensuring the accuracy and efficiency of these systems. Feature engineering and data visualization help in extracting insights from complex data sets, while collaborative filtering and search engines facilitate personalized recommendations. Ethical considerations, privacy concerns, and data security are becoming increasingly important in the development of recommendation engines. User behavior analysis and user interface design are essential for optimizing user experience. Offline recommendations and social media platforms are expanding the reach of recommendation systems, while predictive analytics and performance optimization enhance their effectiveness. Data preprocessing, data mining, and customer segmentation are integral to the data analysis phase of recommendation engine development. Real-time recommendations, natural language processing, and recommendation diversity are key features that differentiate modern recommendation systems from their predecessors. Hybrid recommendations, data enrichment, and deep learning are emerging trends in the market. Recommendation systems are transforming e-commerce platforms by improving product discovery and conversion rate optimization. Model training and algorithm optimization are ongoing processes to ensure recommendation accuracy and relevance. The market dynamics of recommendation engines are constantly unfolding, reflecting the continuous innovation and evolution in this field.

How is this Recommendation Engine Industry segmented?

The recommendation engine industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. End-userMedia and entertainmentRetailTravel and tourismOthersTypeCloudOn-premisesGeographyNorth AmericaUSEuropeGermanyAPACChinaIndiaJapanRest of World (ROW)

By End-user Insights

The media and entertainment segment is estimated to witness significant growth during the forecast period.In the digital age, recommendation engines have become an essential component for various industries, particularly in the media and entertainment segment. These engines utilize big data from content management systems and user behavior analysis to deliver accurate and relevant recommendations for articles, news, games, music, movies, and more. Advanced technologies like machine learning, artificial intelligence, and deep learning are integrated to enhance their capabilities. Recommendation engines segregate data based on categories, languages, and ratings, ensuring a personalized user experience. The surge in online platforms for content consumption has fueled the demand for recommendation engines. Social media platforms and e-commerce sites also leverage these engines for product discovery and conversion rate optimization. Privacy concerns and ethical considerations are addressed through data security measures and user profiling. Predictive analytics and performance optimization ensure recommendation relevanc
Data from: HIT THE LIKE BUTTON! Engagement in posts with digital influencers...
scielo.figshare.com
jpeg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Raquel Avelino; Adrielly Souza Silva; Sérgio Rodrigues Leal (2023). HIT THE LIKE BUTTON! Engagement in posts with digital influencers on the Instagram of Brazilian DMOs [Dataset]. http://doi.org/10.6084/m9.figshare.14327820.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14327820.v1
Dataset updated
May 31, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Maria Raquel Avelino; Adrielly Souza Silva; Sérgio Rodrigues Leal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract In a scenario of expanding competition between tourist destinations, DMOs face the challenge of positioning them attractively. To this end, these organizations can make use of various communication marketing strategies, including social media, platforms whose effectiveness is measured through engagement. From these channels originate the digital influencers, which in recent years have gained greater academic and marketing prominence. Given this theoretical foundation, this research aimed to measure the degree of engagement in publications with digital influencers on Instagram of Brazilian DMOs, with a time frame between December / 17 and December / 18. To achieve the necessary results to solve the proposed problem, the data mining technique was used in a sample of 11 Instagram profiles from Brazilian state DMOs, selected after a filtering process. The collected data were treated from a quantitative descriptive approach, having as parameter three main indicators, as follows: (1) total publications, (2) likes and (3) comments. All these indexes were defined after consulting the literature on engagement. In addition, a T Test was done between paired samples to verify if there was a significative difference on the means. In general, the results indicated that posts with digital influencers have better results, given the proposed time frame, especially when compared with the indexes of general posts. However, inferential statistics indicated that the differences between means were not relevant. In such a way, the strategy of endorsement by influencers does not seem to produce relevant effects on user interaction in the profiles of Brazilian DMOs. The innovative character of this research stems from the use of the data mining technique to deliver accurate results as to the effectiveness of a rising social media strategy, providing managers with a solid framework for analysis and fostering the field of discussion.
PhenoMiner database
data.europa.eu
live.european-language-grid.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo, PhenoMiner database [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-12493?locale=en
Explore at:
unknown(14739288)Available download formats
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Phenotypes play a key role in inferring the complex relationships between genes and human heritable diseases. PhenoMiner is a research project aimed at the capture and encoding of phenotypes in the scientific literature. This should provide insights into the complex processes involved in human diseases as well as enabling semantic interoperability with existing biomedical ontologies such as those that describe human anatomy, genetics and behaviours. The PhenoMiner database contains the results of an FP7 Marie Curie fellowship project on text/data-mining technology - natural language processing, machine learning and conceptual analysis. It builds on insights gained from semantic parsing to extract structured information about phenotypes from whole sentences - in contrast to existing techniques which often apply string matching. The system exploits the wealth of scientific data locked within the scientific literature in databases such as PubMed Central and Europe PMC to extract the semantic vocabulary of phenotypes that scientists use. The system will provide scientists, clinicians and informaticians with the data and tools they need to gain new insights into Mendelian diseases. The database currently contains over 4800 phenotype terms automatically mined from full scientific articles and then associated to Online Mendelian Inheritance of Man (OMIM) disorders. All data is provided without manual filtering. Please contact the author for further information and comments/suggestions. - Nigel Collier (collier@ebi.ac.uk)

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management

Data Mining in Systems Health Management

Explore at:

12 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Apr 10, 2025

Dataset provided by

Dashlink

Description

This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

Clear search

Close search

Google apps

Main menu

Data Mining in Systems Health Management

Data from: Method to assess the functional role of noisy brain signals by...

Data Mining in Systems Health Management - Dataset - NASA Open Data Portal

Simulated supermarket transaction data

Data from: STRATEGY FOR EXTRACTION OF FOURSQUARE’S SOCIAL MEDIA GEOGRAPHIC...

Data from: Scaling data mining in massively parallel dataflow systems

Data Mining Applied to Life Cycle Inventory Modeling for Cumene and Sodium...

Data associated with "A collaborative filtering based approach to biomedical...

The running time(s) of five feature filtering methods on two groups cancer...

Data from: Mining the first 100 days: Human and data ethics in Twitter...

Mapping User Attention: Filtering and Visualizing Relevant UI Components in...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Data from: How are software repositories mined? A systematic literature...

Comparison between only using the weight function and using both the...

MS2NMF Data Release

Optimal Alarm Systems - Dataset - NASA Open Data Portal

Comprehensive Medical Q&A Dataset

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Recommendation Engine Market Analysis North America, Europe, APAC, South...

Snapshot img

Data from: HIT THE LIKE BUTTON! Engagement in posts with digital influencers...

PhenoMiner database

Data Mining in Systems Health Management