Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).
Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.
The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.
This dataset package includes two synthetic datasets with challenging features including varying density, local density differences, shared boundaries and irregular shapes.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this research, we have generated student retention alerts. The alerts are classified into two types: preventive and corrective. This classification varies according to the level of maturity of the data systematization process. Therefore, to systematize the data, data mining techniques have been applied. The experimental analytical method has been used, with a population of 13,715 students with 62 sociological, academic, family, personal, economic, psychological, and institutional variables, and factors such as academic follow-up and performance, financial situation, and personal information. In particular, information is collected on each of the problems or a combination of problems that could affect dropout rates. Following the methodology, the information has been generated through an abstract data model to reflect the profile of the dropout student. As advancement from previous research, this proposal will create preventive and corrective alternatives to avoid dropout higher education. Also, in contrast to previous work, we generated corrective warnings with the application of data mining techniques such as neural networks until reaching a precision of 97% and losses of 0.1052. In conclusion, this study pretends to analyze the behavior of students who drop out the university through the evaluation of predictive patterns. The overall objective is to predict the profile of student dropout, considering reasons such as admission to higher education and career changes. Consequently, using a data systematization process promotes the permanence of students in higher education. Once the profile of the dropout has been identified, student retention strategies have been approached, according to the time of its appearance and the point of view of the institution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract Background Regarding to oral health, little has been advanced on how to improve quality within dental care. Objective The aim of this study was to identify the demographic factors affecting the satisfaction of users of the dental public service having the value of a strategic and high consistency methodology. Method The Data Mining was used to a secondary database, contemplating 91 features, segmental in 9 demographic factors, 17 facets, and 5 dominions. Descriptive statistics were extracted to a demographic data and the satisfaction of the users by facets and dominions, being discovered as from Decision Trees and Association Rules. Results the analysis of the results showed the relation between the demographic factor 'professional occupation' and satisfaction, in all of the dominions. The occupations of general assistant and home assistant with daily wage stood out in Association Rules to represent the lower level of satisfaction compared to the facets that were worse evaluated. Also, the factor ‘health unit's name’ showed relation with most of the investigated dominions. The difference between health units was even more evident through the Association Rule. Conclusion The Data Mining allowed to identify complementary relations to the user's perception about the public oral health services quality, constituting a safe tool to support the management of Brazilian public health and the basis of future plans.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The application of pattern mining algorithms to extract movement patterns from sports big data can improve training specificity by facilitating a more granular evaluation of movement. Since movement patterns can only occur as consecutive, non-consecutive, or non-sequential, this study aimed to identify the best set of movement patterns for player movement profiling in professional rugby league and quantify the similarity among distinct movement patterns. Three pattern mining algorithms (l-length Closed Contiguous [LCCspm], Longest Common Subsequence [LCS] and AprioriClose) were used to extract patterns to profile elite rugby football league hookers (n = 22 players) and wingers (n = 28 players) match-games movements across 319 matches. Jaccard similarity score was used to quantify the similarity between algorithms’ movement patterns and machine learning classification modelling identified the best algorithm’s movement patterns to separate playing positions. LCCspm and LCS movement patterns shared a 0.19 Jaccard similarity score. AprioriClose movement patterns shared no significant Jaccard similarity with LCCspm (0.008) and LCS (0.009) patterns. The closed contiguous movement patterns profiled by LCCspm best-separated players into playing positions. Multi-layered Perceptron classification algorithm achieved the highest accuracy of 91.02% and precision, recall and F1 scores of 0.91 respectively. Therefore, we recommend the extraction of closed contiguous (consecutive) over non-consecutive and non-sequential movement patterns for separating groups of players.
Since protein complexes play important biological roles in cells, many computational methods have been proposed to detect protein complexes from protein-protein interaction (PPI) data. In this paper, we first review four reputed protein-complex detection algorithms (MCODE[2], MCL[21], CPA[1] and DECAFF[14]) and then present a comprehensive evaluation among them on two popular yeast PPI data3. We also discuss their relative strengthes and disadvantages to guide interested researchers. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classifiers’ separation accuracies using sets of extracted movement patterns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
(a) Top 20 APR Patterns Importance Score. (b) Top 20 SMP Patterns Importance Score. (c) Top 20 LCC Patterns Importance Score.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Parameters used in performance evaluation for synthetic data.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Minerals associated with diamond source rocks found in glacial sediments indicate presence of a kimberlite. To a certain extent it also gives an evaluation of the potential presence of diamonds. Regional ice advance and retreat pattern knowledge, combined with geophysical analysis, help to find the kimberlites that are the rock formations where diamonds can be found. This map shows the location of surficial materials and gives the general direction of ice flow.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets were used in the paper 'Evaluation of Thematic Coherence in Microblogs' (ACL, 2021). The data is structured as follows: each file represents a cluster of tweets which contains the tweet IDs, the journalist annotations for quality evaluation and issue identification, as well as the metric evaluation scores. Note that a set of 50 clusters, equally split between COVID-19 and Election domains, is shared between the 3 annotators and thus contains 3 labels.The complete annotation guidelines used for this task are attached.Due to the recent changes in the availability of the Twitter / X academic API, please reach out to iman.bilal@warwick.ac.uk if you consider using the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The movement descriptors and threshold assignment values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The six data sets were created for an undergraduate course at the Babes-Bolyai University, Faculty of Mathematics and Computer Science, held for second year students in the autumn semester. The course is taught both in Romanian and English with the same content and evaluation rules in both languages. The six data sets are the following: - FirstCaseStudy_RO_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the Romanian language - FirstCaseStudy_RO_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the Romanian language - SecondCaseStudy_EN_traditional_2019-2020.txt - contains data about the grades from the 2019-2020 academic year (when traditional face-to-face teaching method was used) for the English language - SecondCaseStudy_EN_online_2020-2021.txt - contains data about the grades from the 2020-2021 academic year (when online teaching was used) for the English language - ThirdCaseStudy_Both_traditional_2019-2020.txt - the concatenation of the two data sets for the 2019-2020 academic year (so all instances from FirstCaseStudy_RO_traditional_2019-2020 and SecondCaseStudy_EN_traditional_2019-2020 together) - ThirdCaseStudy_Both_online_2020-2021.txt - the concatenation of the two data sets for the 2020-2021 academic year (so all instances from FirstCaseStudy_RO_online_2020-2021 and SecondCaseStudy_EN_online_2020-2021 together)Instances from the data sets for the 2019-2020 academic year contain 12 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - the grades received by the student for 2 practical exams. If a student did not participate in a practical exam, de grade was 0. Possible values are between 0 and 10. - the number of seminar activities that the student had. Possible values are between 0 and 7. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4Instances from the data sets for the 2020-2021 academic year contain 10 attributes (in this order): - the grades received by the student for 7 laboratory assignments that were presented during the semester. For assignments that were not turned in a grade of 0 was given. Possible values are between 0 and 10 - a seminar bonus computed based on the number of seminar activities the student had during the semester, which was added to the final grade. Possible values are between 0 and 0.5. - the final grade the student received for the course. It is a value between 4 and 10. - the category of the final grade: - E for grades 10 or 9 - G for grades 8 or 7 - S for grades 6 or 5 - F for grade 4
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Example of processing GPS data into movement sequence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Definitions of common notations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of existing studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the domain of adaptable educational environments, our study is dedicated to achieving three key objectives: forecasting the adaptability of student learning, predicting and evaluating student performance, and employing aspect-based sentiment analysis for nuanced insights into student feedback. Using a systematic approach, we commence with an extensive data preparation phase to ensure data quality, followed by applying efficient data balancing techniques to mitigate biases. By emphasizing higher education or educational data mining, feature extraction methods are used to uncover significant patterns in the data. The basis of our classification method is the robust WideResNeXT architecture, which has been further improved for maximum efficiency by hyperparameter tweaking using the simple Modified Jaya Optimization Method. The recommended WResNeXt-MJ model has emerged as a formidable contender, demonstrating exceptional performance measurements. The model has an average accuracy of 98%, a low log loss of 0.05%, and an extraordinary precision score of 98.4% across all datasets, demonstrating its efficacy in enhancing predictive capacity and accuracy in flexible learning environments. This work presents a comprehensive helpful approach and a contemporary model suitable for flexible learning environments. WResNeXt-MJ’s exceptional performance values underscore its capacity to enhance pupil achievement in global higher education significantly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Evaluation of the WResNeXt-GMJ and current methods using statistical.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mapping of event identifiers to event names.If you use this code or the plots generated from it, please cite/mention the following papers:Pappalardo, L., Cintia, P., Rossi, A. et al. A public data set of spatio-temporal match events in soccer competitions. Sci Data 6, 236 (2019) doi:10.1038/s41597-019-0247-7, https://www.nature.com/articles/s41597-019-0247-7Pappalardo, L., Cintia, P., Ferragina, P., Massucco, E., Pedreschi, D., Giannotti, F. (2019) PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technologies 10(5) Article 59, DOI: https://doi.org/10.1145/3343172, https://dl.acm.org/citation.cfm?id=3343172and the data collection on figshare:Pappalardo, Luca; Massucco, Emanuele (2019): Soccer match event dataset. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4415000
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance evaluation on student performance dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set belongs to the paper "Video-to-Model: Unsupervised Trace Extraction from Videos for Process Discovery and Conformance Checking in Manual Assembly", submitted on March 24, 2020, to the 18th International Conference on Business Process Management (BPM).
Abstract: Manual activities are often hidden deep down in discrete manufacturing processes. For the elicitation and optimization of process behavior, complete information about the execution of Manual activities are required. Thus, an approach is presented on how execution level information can be extracted from videos in manual assembly. The goal is the generation of a log that can be used in state-of-the-art process mining tools. The test bed for the system was lightweight and scalable consisting of an assembly workstation equipped with a single RGB camera recording only the hand movements of the worker from top. A neural network based real-time object classifier was trained to detect the worker’s hands. The hand detector delivers the input for an algorithm, which generates trajectories reflecting the movement paths of the hands. Those trajectories are automatically assigned to work steps using the position of material boxes on the assembly shelf as reference points and hierarchical clustering of similar behaviors with dynamic time warping. The system has been evaluated in a task-based study with ten participants in a laboratory, but under realistic conditions. The generated logs have been loaded into the process mining toolkit ProM to discover the underlying process model and to detect deviations from both, instructions and ground truth, using conformance checking. The results show that process mining delivers insights about the assembly process and the system’s precision.
The data set contains the generated and the annotated logs based on the video material gathered during the user study. In addition, the petri nets from the process discovery and conformance checking conducted with ProM (http://www.promtools.org) and the reference nets modeled with Yasper (http://www.yasper.org/) are provided.