Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hotel customer dataset with 31 variables describing a total of 83,590 instances (customers). It comprehends three full years of customer behavioral data. In addition to personal and behavioral information, the dataset also contains demographic and geographical information. This dataset contributes to reducing the lack of real-world business data that can be used for educational and research purposes. The dataset can be used in data mining, machine learning, and other analytical field problems in the scope of data science. Due to its unit of analysis, it is a dataset especially suitable for building customer segmentation models, including clustering and RFM (Recency, Frequency, and Monetary value) models, but also be used in classification and regression problems.
Facebook
Twitterhttp://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
We propose an LDA-based behavior-topic model (B-LDA) which jointly models user topic interests and behavioral patterns. We focus the study of the model on on-line social network settings such as microblogs like Twitter where the textual content is relatively short but user interactions on them are rich.Related Publication: Qiu, M., Zhu, F., & Jiang, J. (2013). It is not just what we say, but how we say them: LDA-based behavior-topic model. In 2013 SIAM International Conference on Data Mining (SDM’13): 2-4 May, Austin, Texas (pp. 794-802). Philadelphia: SIAM. http://doi.org/10.1137/1.9781611972832.88
Facebook
TwitterData and algorithmsData and algorithms for analysis associated with manuscript. See 'readme.txt' for further detail.alldata.zip
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each study reviewed is here catalogued as follows.· Level of difficulty: Classification Task, Number and List of Classes.· Approach: Method and Main Features.· Performance: Score, Metric, Validation Method.· Realism of dataset: Ground Truth, Person-day, Respondents, Observations, Collection Time, Area, Smartphone App.· Sensors involved: AGPS, Inertial Navigation Systems (INS), Geographic Information Systems (GIS), Data Fusion.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes replication data for the paper: " Sann, R. and Lai, P.-C. (2021), "Do expectations towards Thai hospitality differ? The views of English vs Chinese speaking travelers", International Journal of Culture, Tourism and Hospitality Research, Vol. 15 No. 1, pp. 43-58. https://doi.org/10.1108/IJCTHR-01-2020-0010".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundCoordinated movement in social animal groups via social learning facilitates foraging activity. Few studies have examined the behavioral cause-and-effect between group members that mediates this social learning.Methodology/Principal FindingsWe first established a behavioral paradigm for visual food learning using medaka fish and demonstrated that a single fish can learn to associate a visual cue with a food reward. Grouped medaka fish (6 fish) learn to respond to the visual cue more rapidly than a single fish, indicating that medaka fish undergo social learning. We then established a data-mining method based on Kullback-Leibler divergence (KLD) to search for candidate behaviors that induce alignment and found that high-speed movement of a focal fish tended to induce alignment of the other members locally and transiently under free-swimming conditions without presentation of a visual cue. The high-speed movement of the informed and trained fish during visual cue presentation appeared to facilitate the alignment of naïve fish in response to some visual cues, thereby mediating social learning. Compared with naïve fish, the informed fish had a higher tendency to induce alignment of other naïve fish under free-swimming conditions without visual cue presentation, suggesting the involvement of individual recognition in social learning.Conclusions/SignificanceBehavioral cause-and-effect studies of the high-speed movement between fish group members will contribute to our understanding of the dynamics of social behaviors. The data-mining method used in the present study is a powerful method to search for candidates factors associated with inter-individual interactions using a dataset for time-series coordinate data of individuals.
Facebook
Twitterhttps://dataverse.ird.fr/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.23708/LV8GEWhttps://dataverse.ird.fr/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.23708/LV8GEW
These data and scripts are accompanying the manuscript "Physiological and behavioural resistance of malaria vectors in rural West-Africa: a data mining study to adress their fine-scale spatiotemporal heterogeneity, drivers, and predictability" by Paul Taconet, Dieudonne Diloma Soma, Barnabas Zogo, Karine Mouline, Frederic Simard, Alphonsine Amanan Koffi, Roch Kounbobr Dabiré, Cedric Pennetier, and Nicolas Moiroux. The manuscript has been posted as a preprint on biorXiv (https://doi.org/10.1101/2022.08.20.504631). In this data-mining work, we modeled a set of indicators of physiological resistances to insecticide (prevalence of three target-site mutations) and biting behaviours (early- and late-biting, exophagy) of anopheles mosquitoes in two rural areas of West-Africa, located in Burkina Faso and Cote d'Ivoire. To this aim, we used mosquito field collections along with heterogeneous, multisource and multi-scale environmental data. The objectives were i) to assess the small-scale spatial and temporal heterogeneity of the indicators, ii) to better understand their drivers, and iii) to assess their spatio-temporal predictability, at scales that are consistent with operational action. The explanatory variables covered a wide range of potential environmental determinants of vector resistance to insecticide or feeding behaviour: vector control, human availability and nocturnal behaviour, macro and micro-climatic conditions, landscape, etc. ContentsInput datasets and the R script used for the data analyses are provided. Because the models may take very long to fit (due to the size of the raw data), they were pre-fit, saved as .rds files ('R Data Serialization' format), and made available in the "models" folder. The R script used to answer to one of the reviewer's question (reviewer n°1, question n°1) is also included.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This corpus provides the simulation data mining community with a collection of 14641 bridge models and simulated behavior.
1. Folder "1-designs"
The text files in this directory should contain all information for the
independent variables any machine learning experiment. For reference, all 14641 IFC models are supplied in subfolders 001 to 147.
2. Folder "2-simulation"
This folder contains samples of the simulation output that may be viewed in Paraview (http://www.paraview.org). The original model contains the "Org" filename fragment, and the maximum and minimum behaviors are indicated with "Max" and "Min" filename fragments. Displacement, strain, and stress behaviors are all given. Only three of the 14641 models are given as the file sizes are
around 1.4 to 2.2 megabytes each. The complete data (approximately 81 gigabytes) can be regenerated and provided if necessary on request (email webis@medien.uni-weimar.de).
3. Folder "3-aggregation"
Maximum displacement, strain, and stress measurements are given in the text files individually, and together in the files with the "vtk" filename fragment. This data should be sufficient for the dependent variables of any machine learning experiment.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The codings for the literature-based ethogram and the interview protocol of the paper titled: "Understanding the Behavior of Process Mining Analysts: A Catalogue of Exploratory Process Mining Behaviors" can be found in this depository.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 500 tweets related to financial literacy and consumer behavior, designed for tasks such as sentiment analysis, emotion classification, and behavior prediction. The dataset was generated to support research in financial literacy education and consumer behavior modeling, incorporating realistic tweet structures and metadata.
Dataset Features tweet_content (string): The text of the tweets, reflecting various financial literacy topics and emotions.
emotion (categorical): The emotion expressed in the tweet, selected from:
Positive Fear Anticipation Disgust Surprise sentiment_score (float): A numerical score representing the sentiment of the tweet, ranging from -1 (negative sentiment) to 1 (positive sentiment).
likes (integer): Number of likes the tweet received (simulated).
retweets (integer): Number of retweets the tweet received (simulated).
replies (integer): Number of replies the tweet received (simulated).
topic_tags (categorical): The main financial topic discussed in the tweet, selected from:
Savings Investment Budgeting Debt Management Financial Planning Credit Scores Spending Habits financial_behavior (categorical): A classification of the financial behavior implied by the tweet, categorized as:
Good behavior Moderate behavior Risky behavior Potential Use Cases Sentiment analysis and emotion classification. Behavioral modeling for financial decision-making. Testing machine learning algorithms for financial literacy. Educational applications for personalized financial learning platforms. Simulating tweet analysis in social media mining studies.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset contains a collection of experiment results and event logs generated. The experiment comprises a job-shop scheduling problem, implemented in a discrete-event simulation model. The raw experiment results are given from which event log files can be generated by following the steps as described in this data paper or the referred academic paper. A collection of event log files is given, as well as the raw files. The logs include the filtered part of the case study as presented in the paper "An agent-based process mining architecture for emergent behavior analysis" by Rob Bemthuis, Martijn Koot, Martijn Mes, Faiza Bukhsh, Maria-Eugenia Iacob, and Nirvana Meratnia.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Global Wildfire Database for GWIS (2021) is an individual fire event focused database. Post processing of MCD64A1 providing geometries of final fire perimeters including initial and final date and the corresponding daily active areas for each fire. This dataset is an update of the data related with GlobFire (https://doi.org/10.6084/m9.figshare.10284101). […]
Facebook
Twitterhttps://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global Data Mining Software market size will be USD XX million in 2025. It will expand at a compound annual growth rate (CAGR) of XX% from 2025 to 2031.
North America held the major market share for more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Europe accounted for a market share of over XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Asia Pacific held a market share of around XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Latin America had a market share of more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Middle East and Africa had a market share of around XX% of the global revenue and was estimated at a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. KEY DRIVERS
Increasing Focus on Customer Satisfaction to Drive Data Mining Software Market Growth
In today’s hyper-competitive and digitally connected marketplace, customer satisfaction has emerged as a critical factor for business sustainability and growth. The growing focus on enhancing customer satisfaction is proving to be a significant driver in the expansion of the data mining software market. Organizations are increasingly leveraging data mining tools to sift through vast volumes of customer data—ranging from transactional records and website activity to social media engagement and call center logs—to uncover insights that directly influence customer experience strategies. Data mining software empowers companies to analyze customer behavior patterns, identify dissatisfaction triggers, and predict future preferences. Through techniques such as classification, clustering, and association rule mining, businesses can break down large datasets to understand what customers want, what they are likely to purchase next, and how they feel about the brand. These insights not only help in refining customer service but also in shaping product development, pricing strategies, and promotional campaigns. For instance, Netflix uses data mining to recommend personalized content by analyzing a user's viewing history, ratings, and preferences. This has led to increased user engagement and retention, highlighting how a deep understanding of customer preferences—made possible through data mining—can translate into competitive advantage. Moreover, companies are increasingly using these tools to create highly targeted and customer-specific marketing campaigns. By mining data from e-commerce transactions, browsing behavior, and demographic profiles, brands can tailor their offerings and communications to suit individual customer segments. For Instance Amazon continuously mines customer purchasing and browsing data to deliver personalized product recommendations, tailored promotions, and timely follow-ups. This not only enhances customer satisfaction but also significantly boosts conversion rates and average order value. According to a report by McKinsey, personalization can deliver five to eight times the ROI on marketing spend and lift sales by 10% or more—a powerful incentive for companies to adopt data mining software as part of their customer experience toolkit. (Source: https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/personalizing-at-scale#/) The utility of data mining tools extends beyond e-commerce and streaming platforms. In the banking and financial services industry, for example, institutions use data mining to analyze customer feedback, call center transcripts, and usage data to detect pain points and improve service delivery. Bank of America, for instance, utilizes data mining and predictive analytics to monitor customer interactions and provide proactive service suggestions or fraud alerts, significantly improving user satisfaction and trust. (Source: https://futuredigitalfinance.wbresearch.com/blog/bank-of-americas-erica-client-interactions-future-ai-in-banking) Similarly, telecom companies like Vodafone use data mining to understand customer churn behavior and implement retention strategies based on insights drawn from service usage patterns and complaint histories. In addition to p...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a comprehensive view of student performance and learning behavior, integrating academic, demographic, behavioral, and psychological factors.
It was created by merging two publicly available Kaggle datasets, resulting in a unified dataset of 14,003 student records with 16 attributes. All entries are anonymized, with no personally identifiable information.
StudyHours, Attendance, Extracurricular, AssignmentCompletion, OnlineCourses, DiscussionsResources, Internet, EduTechMotivation, StressLevelGender, Age (18–30 years)LearningStyleExamScore, FinalGradeThe dataset can be used for:
ExamScore, FinalGrade)The dataset was analyzed in Python using:
LearningStyle categories & extracting insights for adaptive learningmerged_dataset.csv → 14,003 rows × 16 columns
Includes student demographics, behaviors, engagement, learning styles, and performance indicators.This dataset is an excellent playground for educational data mining — from clustering and behavioral analytics to predictive modeling and personalized learning applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the source code, the dataset and result files used to validate the work entitled "A method to identify defensive assignments in team-based invasion sports using spatiotemporal trajectories" that is under publication at the International Journal of Geographical Information Science. The complete reference of the published paper will be posted when available.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample data (five types of features of one participant)
Facebook
Twitterhttps://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
In the rapidly evolving US fraud detection software industry, developers invest significant capital in staying ahead of increasingly sophisticated cyber threats and fraud tactics. Over the past five years, accelerated digitalization, a surge in real-time payments and the adoption of e-commerce have fueled demand for industry solutions. Emerging trends such as behavioral biometrics, deepfake detection and real-time anomaly scoring have become essential, and developers now deliver cloud-based platforms able to address emerging threats. As businesses, banks, healthcare providers and public sector organizations face rising regulatory scrutiny and compliance demands, industry revenue has grown at a CAGR of 9.1% to an estimated $26.3 billion, including anticipated growth of 5.4% in 2025 alone. The widespread adoption of contactless payment technologies, such as mobile wallets and tap-to-pay cards enabled by Near Field Communication (NFC), has introduced a fresh set of vulnerabilities. Cybercriminals are now leveraging advanced techniques to exploit weaknesses that legacy systems are not designed to detect. These threats have required fraud detection software developers to integrate novel security measures into their offerings. Meanwhile, the rapid growth of e-commerce has been a significant driver of demand for fraud detection software among retail and wholesale companies. As more consumers migrate to online shopping platforms, transaction volumes have soared, exposing retailers and wholesalers to heightened risks. This has provided industry developers with a high-growth market where they often benefit from increased pricing power, which supports profit growth. Moving forward, the industry is set for further transformation as regulatory mandates around AI-enabled fraud prevention, deepfake detection and real-time compliance reporting become widespread. Continuous M&A activity and increased demand from high-growth market segments will strengthen revenue streams. Despite ongoing competitive pressures and rapidly shifting threat landscapes, these factors are forecast to support a robust industry revenue CAGR of 5.2% through 2030, reaching an estimated $33.8 billion.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Anonymised datafile, extracted from YFCC100M archive, contains tags and keywords corresponding to risk-signalling and neutral environmental semantics
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hotel customer dataset with 31 variables describing a total of 83,590 instances (customers). It comprehends three full years of customer behavioral data. In addition to personal and behavioral information, the dataset also contains demographic and geographical information. This dataset contributes to reducing the lack of real-world business data that can be used for educational and research purposes. The dataset can be used in data mining, machine learning, and other analytical field problems in the scope of data science. Due to its unit of analysis, it is a dataset especially suitable for building customer segmentation models, including clustering and RFM (Recency, Frequency, and Monetary value) models, but also be used in classification and regression problems.