Facebook
TwitterThe "Vietnamese Spam Post in Social Network" dataset contains textual data collected from social media platforms. This dataset is specifically designed for spam detection tasks and includes labeled posts categorized as either spam or non-spam. Each post is written in Vietnamese, making it a valuable resource for natural language processing (NLP) research focused on the Vietnamese language. The dataset is ideal for training and evaluating machine learning models in tasks such as spam classification and text filtering in social networking environments.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Models from experiments referenced in the paper "Training CNNs with Low-Rank Filters for Efficient Image Classification", https://arxiv.org/abs/1511.06744
Model names differ from those in the paper, but the csv files for each set of experiments relates the paper's name for the model and the real name of the model here:
cifarma.csv: Network-in-Network CIFAR10 Models
mitma.csv: MIT Places Models
googlenetma.csv: GoogLeNet ILSVRC2012 Models
vggma.csv: VGG-11 ILSVRC2012 Models
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set is related to the article 'Automatic plankton image classification - can capsules and filters help coping with data set shift?' published in 'Limnology and Oceanography: Methods' by Plonus et al. (2021).
The images belong to the trainings set used to train the models in the aforementioned paper (training_) and three different additional data sets which were used to evaluate the performance of the trained models in application mode (fs446_; fs466_; fs534_). The Python-Script 'separate_files.py' can be used to move all the images in different folders for each data set and class respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this study, a novel spatial filter design method is introduced. Spatial filtering is an important processing step for feature extraction in motor imagery-based brain-computer interfaces. This paper introduces a new motor imagery signal classification method combined with spatial filter optimization. We simultaneously train the spatial filter and the classifier using a neural network approach. The proposed spatial filter network (SFN) is composed of two layers: a spatial filtering layer and a classifier layer. These two layers are linked to each other with non-linear mapping functions. The proposed method addresses two shortcomings of the common spatial patterns (CSP) algorithm. First, CSP aims to maximize the between-classes variance while ignoring the minimization of within-classes variances. Consequently, the features obtained using the CSP method may have large within-classes variances. Second, the maximizing optimization function of CSP increases the classification accuracy indirectly because an independent classifier is used after the CSP method. With SFN, we aimed to maximize the between-classes variance while minimizing within-classes variances and simultaneously optimizing the spatial filter and the classifier. To classify motor imagery EEG signals, we modified the well-known feed-forward structure and derived forward and backward equations that correspond to the proposed structure. We tested our algorithm on simple toy data. Then, we compared the SFN with conventional CSP and its multi-class version, called one-versus-rest CSP, on two data sets from BCI competition III. The evaluation results demonstrate that SFN is a good alternative for classifying motor imagery EEG signals with increased classification accuracy.
Facebook
TwitterThis dataset was created by Bilal Ahmad
Released under Other (specified in description)
Facebook
TwitterK denotes the number of filters in the first stage of the convolutional layers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.
In this paper, a neural network is used to classify each \(k \times k\) unit cell design of metamaterial M1 and M2 into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below.
NeuralNetwork_train_and_test_data.zip
This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). It contains data for metamaterial M1 ("smiley_cube"), and metamaterial M2 classification (i) ("prek_xy") and (ii) ("unimodal_vs_oligomodal_inc_stripmodes").
CNN_saves_kxk.zip
This file contains the parameter configurations of the CNNs trained on \(k \times k\) unit cells for metamaterial M2 classification (ii). Classification (i) is denoted by an additional M2ii in the file name. Metamaterial M1 is denoted by an extra M1 in the file name. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Among the most common cancers, colorectal cancer (CRC) has a high death rate. The best way to screen for colorectal cancer (CRC) is with a colonoscopy, which has been shown to lower the risk of the disease. As a result, Computer-aided polyp classification technique is applied to identify colorectal cancer. But visually categorizing polyps is difficult since different polyps have different lighting conditions. Different from previous works, this article presents Enhanced Scattering Wavelet Convolutional Neural Network (ESWCNN), a polyp classification technique that combines Convolutional Neural Network (CNN) and Scattering Wavelet Transform (SWT) to improve polyp classification performance. This method concatenates simultaneously learnable image filters and wavelet filters on each input channel. The scattering wavelet filters can extract common spectral features with various scales and orientations, while the learnable filters can capture image spatial features that wavelet filters may miss. A network architecture for ESWCNN is designed based on these principles and trained and tested using colonoscopy datasets (two public datasets and one private dataset). An n-fold cross-validation experiment was conducted for three classes (adenoma, hyperplastic, serrated) achieving a classification accuracy of 96.4%, and 94.8% accuracy in two-class polyp classification (positive and negative). In the three-class classification, correct classification rates of 96.2% for adenomas, 98.71% for hyperplastic polyps, and 97.9% for serrated polyps were achieved. The proposed method in the two-class experiment reached an average sensitivity of 96.7% with 93.1% specificity. Furthermore, we compare the performance of our model with the state-of-the-art general classification models and commonly used CNNs. Six end-to-end models based on CNNs were trained using 2 dataset of video sequences. The experimental results demonstrate that the proposed ESWCNN method can effectively classify polyps with higher accuracy and efficacy compared to the state-of-the-art CNN models. These findings can provide guidance for future research in polyp classification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary material for the ITC-Net-Blend-60. It includes the full methodology document, and Python scripts to filter background traffic and extract PCAP file properties.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Overview: This dataset contains a collection of emails, categorized into two classes: "Spam" and "Non-Spam" (often referred to as "Ham"). These emails have been carefully curated and labeled to aid in the development of spam email detection models. Whether you are interested in email filtering, natural language processing, or machine learning, this dataset can serve as a valuable resource for training and evaluation.
Context: Spam emails continue to be a significant issue, with malicious actors attempting to deceive users with unsolicited, fraudulent, or harmful messages. This dataset is designed to facilitate research, development, and testing of algorithms and models aimed at accurately identifying and filtering spam emails, helping protect users from various threats.
Content: The dataset includes the following features: Message: The content of the email, including the subject line and message body. Category: Categorizes each email as either "Spam" or "Ham" (Non-Spam).
Potential Use Cases: - Email Filtering: Develop and evaluate email filtering systems that automatically classify incoming emails as spam or non-spam. - Natural Language Processing (NLP): Use the email text for text classification, topic modeling, and sentiment analysis. - Machine Learning: Create machine learning models for spam detection, potentially employing various algorithms and techniques. - Feature Engineering: Explore email content features that contribute to spam classification accuracy. - Data Analysis: Investigate patterns and trends in spam email content and characteristics.
License: Please note that this dataset is for research and analysis purposes only and may be subject to copyright and data use restrictions. Ensure compliance with relevant policies when using this data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.
The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.
If you share or use this dataset, please cite [4] and [5] in any relevant documentation.
In addition, an image dataset for crack classification has also been published at [6].
References:
[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873
[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605
[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434
[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678
[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044
[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78
Facebook
Twitterhttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Network of 43 papers and 85 citation links related to "Novel Approach to Evaluate Classification Algorithms and Feature Selection Filter Algorithms Using Medical Data".
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wavelet methods are widely used to decompose fMRI, EEG, or MEG signals into time series representing neurophysiological activity in fixed frequency bands. Using these time series, one can estimate frequency-band specific functional connectivity between sensors or regions of interest, and thereby construct functional brain networks that can be examined from a graph theoretic perspective. Despite their common use, however, practical guidelines for the choice of wavelet method, filter, and length have remained largely undelineated. Here, we explicitly explore the effects of wavelet method (MODWT vs. DWT), wavelet filter (Daubechies Extremal Phase, Daubechies Least Asymmetric, and Coiflet families), and wavelet length (2 to 24)—each essential parameters in wavelet-based methods—on the estimated values of graph metrics and in their sensitivity to alterations in psychiatric disease. We observe that the MODWT method produces less variable estimates than the DWT method. We also observe that the length of the wavelet filter chosen has a greater impact on the estimated values of graph metrics than the type of wavelet chosen. Furthermore, wavelet length impacts the sensitivity of the method to detect differences between health and disease and tunes classification accuracy. Collectively, our results suggest that the choice of wavelet method and length significantly alters the reliability and sensitivity of these methods in estimating values of metrics drawn from graph theory. They furthermore demonstrate the importance of reporting the choices utilized in neuroimaging studies and support the utility of exploring wavelet parameters to maximize classification accuracy in the development of biomarkers of psychiatric disease and neurological disorders.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises network traffic collected from 24 Internet of Things (IoT) devices over a span of 119 days, capturing a total of over 110 million packets. The devices represent 19 distinct types and were monitored in a controlled environment under normal operating conditions, reflecting a variety of functions and behaviors typical of consumer IoT products (pcapIoT). The packet capture (pcap) files preserve complete packet information across all protocol layers, including ARP, TCP, HTTP, and various application-layer protocols. Raw pcap files (pcapFull) are also provided, which contain traffic from 36 non-IoT devices present in the network. To facilitate device-specific analysis, a CSV file is included that maps each IoT device to its unique MAC address. This mapping simplifies the identification and filtering of packets belonging to each device within the pcap files. 3 extra CSV (CSVs) files provide metadate about the states that the devices were in at different times. Additionally, Python scripts (Scripts) are provided to assist in extracting and processing packets. These scripts include functionalities such as packet filtering based on MAC addresses and protocol-specific data extraction, serving as practical examples for data manipulation and analysis techniques. This dataset is valuable for researchers interested in network behavior analysis, anomaly detection, and the development of IoT-specific network policies. It enables the study and differentiation of network behaviors based on device functions and supports behavior-based profiling to identify irregular activities or potential security threats.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Firms in datasets after filtering steps.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary the information about compared network architectures.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Combinatorial Rules in Mechanical Metamaterials', as published in XXX. In this paper, a neural network is used to classify each (k \times k) unit cell design into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below. NeuralNetwork_train_and_test_data.zip This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). CNN_saves_kxk.zip This file contains the parameter configurations of the CNNs trained on (k \times k) unit cells. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
Facebook
Twitter**Source:
Generated to model psychological experiments reported by Siegler, R. S. (1976). Three Aspects of Cognitive Development. Cognitive Psychology, 8, 481-520. ** Donor:
Tim Hume (hume '@' ics.uci.edu)
Data Set Information:
This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are the left weight, the left distance, the right weight, and the right distance. The correct way to find the class is the greater of (left-distance * left-weight) and (right-distance * right-weight). If they are equal, it is balanced.
**Attribute Information:
Relevant Papers:
Klahr, D., & Siegler, R.S. (1978). The Representation of Children's Knowledge. In H. W. Reese & L. P. Lipsitt (Eds.), Advances in Child Development and Behavior, pp. 61-116. New York: Academic Press [Web Link]
Langley,P. (1987). A General Theory of Discrimination Learning. In D. Klahr, P. Langley, & R. Neches (Eds.), Production System Models of Learning and Development, pp. 99-161. Cambridge, MA: MIT Press [Web Link]
Newell, A. (1990). Unified Theories of Cognition. Cambridge, MA: Harvard University Press [Web Link]
McClelland, J.L. (1988). Parallel Distibuted Processing: Implications for Cognition and Development. Technical Report AIP-47, Department of Psychology, Carnegie-Mellon University [Web Link]
Shultz, T., Mareschal, D., & Schmidt, W. (1994). Modeling Cognitive Development on Balance Scale Phenomena. Machine Learning, Vol. 16, pp. 59-88. [Web Link]
Papers That Cite This Data Set1:
Zhi-Hua Zhou and Yuan Jiang and Shifu Chen. Extracting symbolic rules from trained neural network ensembles. AI Commun, 16. 2003.
Jianbin Tan and David L. Dowe. MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes. Australian Conference on Artificial Intelligence. 2003.
Peter Sykacek and Stephen J. Roberts. Adaptive Classification by Variational Kalman Filtering. NIPS. 2002.
Remco R. Bouckaert. Accuracy bounds for ensembles under 0 { 1 loss. Xtal Mountain Information Technology & Computer Science Department, University of Waikato. 2002.
Nir Friedman and Moisés Goldszmidt and Thomas J. Lee. Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting. ICML. 1998.
Hirotaka Inoue and Hiroyuki Narihisa. Experiments with an Ensemble Self-Generating Neural Network. Okayama University of Science.
Alexander K. Seewald. Meta-Learning for Stacked Classification. Austrian Research Institute for Artificial Intelligence.
Alexander K. Seewald. Dissertation Towards Understanding Stacking Studies of a General Ensemble Learning Scheme ausgefuhrt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Naturwissenschaften.
Original Source : https://archive.ics.uci.edu/ml/datasets/Balance+Scale
Facebook
Twitterhttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Network of 27 papers and 58 citation links related to "Classification of Body Movements in Ambulatory ECG Using Wavelet Transform, Adaptive Filter and Artificial Neural Networks".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To cite the dataset please reference it as “Stratosphere Laboratory. A labeled dataset with malicious and benign IoT network traffic. January 22th. Agustin Parmisano, Sebastian Garcia, Maria Jose Erquiaga. https://www.stratosphereips.org/datasets-iot23
This dataset includes labels that explain the linkages between flows connected with harmful or possibly malicious activity to provide network malware researchers and analysts with more thorough information. These labels were painstakingly created at the Stratosphere labs using malware capture analysis.
We present a concise explanation of the labels used for the identification of malicious flows, based on manual network analysis, below:
Attack: This label signifies the occurrence of an attack originating from an infected device directed towards another host. Any flow that endeavors to exploit a vulnerable service, discerned through payload and behavioral analysis, falls under this classification. Examples include brute force attempts on telnet logins or header-based command injections in GET requests.
Benign: The "Benign" label denotes connections where no suspicious or malicious activities have been detected.
C&C (Command and Control): This label indicates that the infected device has established a connection with a Command and Control server. This observation is rooted in the periodic nature of connections or activities such as binary downloads or the exchange of IRC-like or decoded commands.
DDoS (Distributed Denial of Service): "DDoS" is assigned when the infected device is actively involved in a Distributed Denial of Service attack, identifiable by the volume of flows directed towards a single IP address.
FileDownload: This label signifies that a file is being downloaded to the infected device. It is determined by examining connections with response bytes exceeding a specified threshold (typically 3KB or 5KB), often in conjunction with known suspicious destination ports or IPs associated with Command and Control servers.
HeartBeat: "HeartBeat" designates connections where packets serve the purpose of tracking the infected host by the Command and Control server. Such connections are identified through response bytes below a certain threshold (typically 1B) and exhibit periodic similarities. This is often associated with known suspicious destination ports or IPs linked to Command and Control servers.
Mirai: This label is applied when connections exhibit characteristics resembling those of the Mirai botnet, based on patterns consistent with common Mirai attack profiles.
Okiru: Similar to "Mirai," the "Okiru" label is assigned to connections displaying characteristics of the Okiru botnet. The parameters for this label are the same as for Mirai, but Okiru is a less prevalent botnet family.
PartOfAHorizontalPortScan: This label is employed when connections are involved in a horizontal port scan aimed at gathering information for potential subsequent attacks. The labeling decision hinges on patterns such as shared ports, similar transmitted byte counts, and multiple distinct destination IPs among the connections.
Torii: The "Torii" label is used when connections exhibit traits indicative of the Torii botnet, with labeling criteria similar to those used for Mirai, albeit in the context of a less common botnet family.
| Field Name | Description | Type |
|---|---|---|
| ts | The timestamp of the connection event. | time |
| uid | A unique identifier for the connection. | string |
| id.orig_h | The source IP address. | addr |
| id.orig_p | The source port. | port |
| id.resp_h | The destination IP address. | addr |
| id.resp_p | The destination port. | port |
| proto | The network protocol used (e.g., 'tcp'). | enum |
| service | The service associated with the connection. | string |
| duration | The duration of the connection. | interval |
| orig_bytes | The number of bytes sent from the source to the destination. | count |
| resp_bytes | The number of bytes sent from the destination to the source. | count |
| conn_state | The state of the connection. | string |
| local_orig | Indicates whether the connection is considered local or not. | bool |
| local_resp | Indicates whether the connection is considered... |
Facebook
TwitterThe "Vietnamese Spam Post in Social Network" dataset contains textual data collected from social media platforms. This dataset is specifically designed for spam detection tasks and includes labeled posts categorized as either spam or non-spam. Each post is written in Vietnamese, making it a valuable resource for natural language processing (NLP) research focused on the Vietnamese language. The dataset is ideal for training and evaluating machine learning models in tasks such as spam classification and text filtering in social networking environments.