Facebook
TwitterMachine learning methods and agent-based models enable the optimization of the operation of high-capacity facilities. In this paper, we propose a method for automatically extracting and cleaning pedestrian traffic detector data for subsequent calibration of the ingress pedestrian model. The data was obtained from the waiting room traffic of a vaccination center. Walking speed distribution, the number of stops, the distribution of waiting times, and the locations of waiting points were extracted. Of the 9 machine learning algorithms, the random forest model achieved the highest accuracy in classifying valid data and noise. The proposed microscopic calibration allows for more accurate capacity assessment testing, procedural changes testing, and geometric modifications testing in parts of the facility adjacent to the calibrated parts. The results show that the proposed method achieves state-of-the-art performance on a violent-flows dataset. The proposed method has the potential to significantly improve the accuracy and efficiency of input model predictions and optimize the operation of high-capacity facilities.
Facebook
TwitterStatistical analysis of classification method accuracy.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Introduction
Color classification is an important application that is used in many areas. For example, systems that perform daily. SVM classifier with an optimal hyperplane life analysis can benefit from this classification process. For the classification process, lots of classification algorithms can be used. Among them, the most popular machine learning algorithms are neural networks, decision trees, k-nearest neighbors, Bayes network, support vector machines. In this work for training, SVMs are used and a classifier model was tried to be obtained. SVMs algorithm is one of the supervised learning methods. SVM calls for solutions to regression and classification problems as in all supervised learning methods. This algorithm is usually used to training for separate and classify different labeled samples. As a result of training with SVM, it is aimed to create an optimum hyperplane and classify the data in different classes. This hyperplane is located as far away from the data as possible to avoid error conditions.
Dataset
The datasets have contained about 80 images for trainset datasets for whole color classes and 90 images for the test set. colors which are prepared for this application is y yellow, black, white, green, red, orange, blue a and violet. In this implementation, basic colors are preferred for classification. and created a dataset containing images of these basic colors. The dataset also includes masks for all images. we create these masks by binarizing the image. we did the masking on the images I collected and painted the pixels belonging to the class color to white and remaining pixels to the black color.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification accuracies obtained with the proposed hybrid model and the other state-of-the-art classifiers from the recent literature for the data sets under consideration.
Facebook
TwitterThe scores for the precision, recall, F1-score, and accuracy metrics are shown for all four classification methods. The method with the highest score for each metric is highlighted in bold. The (†) symbol by the thresholding methods indicates that these methods only classified 18 zones, whereas the neural network methods classified 21 zones.
Facebook
TwitterMULTI-LABEL ASRS DATASET CLASSIFICATION USING SEMI-SUPERVISED SUBSPACE CLUSTERING MOHAMMAD SALIM AHMED, LATIFUR KHAN, NIKUNJ OZA, AND MANDAVA RAJESWARI Abstract. There has been a lot of research targeting text classification. Many of them focus on a particular characteristic of text data - multi-labelity. This arises due to the fact that a document may be associated with multiple classes at the same time. The consequence of such a characteristic is the low performance of traditional binary or multi-class classification techniques on multi-label text data. In this paper, we propose a text classification technique that considers this characteristic and provides very good performance. Our multi-label text classification approach is an extension of our previously formulated [3] multi-class text classification approach called SISC (Semi-supervised Impurity based Subspace Clustering). We call this new classification model as SISC-ML(SISC Multi-Label). Empirical evaluation on real world multi-label NASA ASRS (Aviation Safety Reporting System) data set reveals that our approach outperforms state-of-theart text classification as well as subspace clustering algorithms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The assessment results of the proposed model in comparison with the all other thirteen methods based on the two multi-class data sets by applying the nine commonly used performance evaluation criteria.
Facebook
TwitterA hierarchically ordered distribution of 3D-points was created with matlab. It contains 120,000 datapoints in five hierarchical levels with one to four child nodes per parent. Data values for the three axes range betwwen 0 and 1. The structure can be seen in the attached figure. In each hierarchical level different distributions of datapoints are implemented. This allows to test classifiers under various conditions. The most common distribution in the dataset is a simple gaussian distributed point cloud. Other sampled distributions are a spherical distribution (sphere in 3D), or a circular (donut) distribution along different axes. XOR distributions are implemented in different patterns, e.g. four batches with crossed classes or eight batches with two or four classes. The most complex data distribution is the springroll, where the datapoints are intertwined into one another. To create indistinguishable cases, where the prediction of a classifier is supposed to perform bad, some datapoints are just randomly intermixed with another class.
The .csv-file contains four columns: label | x-coordinate | y-coordinate | z-coordinate
The label for each sample provides all hierarchical information needed. Each label is composed of five digits, one for each hierarchical level. As an example:
Sample '11421': Hierarchical level 1: class 1 Hierarchical level 2: class 1 Hierarchical level 3: class 4 Hierarchical level 4: class 2 Hierarchical level 5: class 1
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Contex Malware detection is a critical task in cybersecurity, aimed at identifying malicious software that can harm or exploit system vulnerabilities. With the increasing complexity and volume of malware, traditional detection methods are often inadequate. This project leverages machine learning to enhance the accuracy and efficiency of malware dataset. Source The dataset used in this project is synthetically generated to simulate a realistic distribution of malware and benign samples. It includes various features that represent typical characteristics observed in malware behavior. The dataset has been preprocessed and scaled to ensure optimal performance of machine learning models.
Inspiration The inspiration behind this project stems from the ongoing challenge in the cybersecurity field to stay ahead of evolving threats. By applying machine learning algorithms, we aim to develop more robust and adaptive detection mechanisms. This project is also inspired by the potential of AI to transform cybersecurity practices, making systems more secure and resilient against attacks. The ultimate goal is to contribute to the development of advanced tools that can better protect individuals and organizations from malicious software.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by YuanCHEN_AG
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Lions or Cheetahs - Image Classification" dataset is a collection of images downloaded from the Open Images Dataset V6, containing photographs of both lions and cheetahs. This dataset has been compiled for the purpose of training and evaluating image classification algorithms.
The dataset contains a total of 200 images. The images have been labeled as either "lion" or "cheetah" and are stored in separate directories within the dataset.
This dataset can be used for a variety of tasks related to image classification, including developing and testing deep learning algorithms, evaluating the effectiveness of different image features and classification techniques, and comparing the performance of different models.
Researchers and practitioners interested in using this dataset are encouraged to cite the original source, Open Images Dataset V6, and to acknowledge any modifications made to the dataset for their particular use.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Fruit Classification Dataset is designed to classify different types of fruits based on their spatial coordinates. It includes data points with 'x' and 'y' coordinates and their corresponding fruit class labels (apple, banana, orange), facilitating the development and testing of classification models for simple geometric data.
2) Data Utilization (1) Fruit Classification data has characteristics that: • It contains detailed coordinates (x and y) for each fruit class, allowing for the visualization and analysis of fruit distribution in a two-dimensional space. This dataset is ideal for understanding basic classification algorithms and testing their performance. (2) Fruit Classification data can be used to: • Machine Learning Education: Supports the teaching and learning of classification techniques, data visualization, and feature extraction in an accessible and engaging manner. • Algorithm Testing: Provides a straightforward dataset for evaluating and comparing the performance of various classification algorithms in distinguishing between different fruit types based on coordinates.
Facebook
TwitterWith the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Carrots vs Rockets - Image Classification" dataset is a collection of images downloaded from various sources, containing photographs of both carrots and rockets. This dataset has been compiled for the purpose of training and evaluating image classification algorithms.
The dataset contains a total of 306 images. The images have been labeled as either "carrot" or "rocket" and are stored in separate directories.
This dataset can be used for a variety of tasks related to image classification, including developing and testing deep learning algorithms, evaluating the effectiveness of different image features and classification techniques, and comparing the performance of different models.
Researchers and practitioners interested in using this dataset are encouraged to cite the original sources of the images and to acknowledge any modifications made to the dataset for their particular use. The dataset may be useful for tasks such as automated vegetable sorting or satellite image analysis.
Facebook
TwitterWe compared the predictive ability of 52 classification algorithms that were available in ShinyLearner and had been implemented across 4 open-source machine-learning libraries. The abbreviation for each algorithm contains a prefix indicating which machine-learning library implemented the algorithm (mlr = Machine learning in R, sklearn = scikit-learn, weka = WEKA: The workbench for machine learning; keras = Keras). For each algorithm, we provide a brief description of the algorithmic approach; we extracted these descriptions from the libraries that implemented the algorithms. In addition, we assigned high-level categories that characterize the algorithmic methodology used by each algorithm. In some cases, the individual machine-learning libraries aggregated algorithm implementations from third-party packages. In these cases, we cite the machine-learning library and the third-party package. When available, we also cite papers that describe the algorithmic methodologies used. Finally, for each algorithm, we indicate the number of unique hyperparameter combinations evaluated in Analysis 4.
Facebook
TwitterThe number of subjects in binary task was 12 and the number of subjects in multi-task BCIs was 9. The number in the parenthesis corresponds to the average rank of the algorithm among different subjects. For each feature extraction method the classifiers typed in bold are the recommended ones. The recommended classifiers are selected based on the results of the statistical tests.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
🇦🇹 오스트리아
Facebook
TwitterCor (Correspondence): correspondence number with the Bollet/Servant cohort. scores: scores obtained with partial identity (PIS) or methylation (MS). Time: time elapsed between diagnosis of the PT and diagnosis of the recurrence. Classification: classification of the recurrence based on copy number (PIS), methylation (MS) or clinical features (clinical). Divergence: which method deviated from the others.
Facebook
TwitterMULTI-TEMPORAL REMOTE SENSING IMAGE CLASSIFICATION - A MULTI-VIEW APPROACH VARUN CHANDOLA AND RANGA RAJU VATSAVAI Abstract. Multispectral remote sensing images have been widely used for automated land use and land cover classification tasks. Often thematic classification is done using single date image, however in many instances a single date image is not informative enough to distinguish between different land cover types. In this paper we show how one can use multiple images, collected at different times of year (for example, during crop growing season), to learn a better classifier. We propose two approaches, an ensemble of classifiers approach and a co-training based approach, and show how both of these methods outperform a straightforward stacked vector approach often used in multi-temporal image classification. Additionally, the co-training based method addresses the challenge of limited labeled training data in supervised classification, as this classification scheme utilizes a large number of unlabeled samples (which comes for free) in conjunction with a small set of labeled training data.
Facebook
TwitterMachine learning methods and agent-based models enable the optimization of the operation of high-capacity facilities. In this paper, we propose a method for automatically extracting and cleaning pedestrian traffic detector data for subsequent calibration of the ingress pedestrian model. The data was obtained from the waiting room traffic of a vaccination center. Walking speed distribution, the number of stops, the distribution of waiting times, and the locations of waiting points were extracted. Of the 9 machine learning algorithms, the random forest model achieved the highest accuracy in classifying valid data and noise. The proposed microscopic calibration allows for more accurate capacity assessment testing, procedural changes testing, and geometric modifications testing in parts of the facility adjacent to the calibrated parts. The results show that the proposed method achieves state-of-the-art performance on a violent-flows dataset. The proposed method has the potential to significantly improve the accuracy and efficiency of input model predictions and optimize the operation of high-capacity facilities.