45 datasets found

i
Online Shoppers Purchasing Intention Dataset
ieee-dataport.org
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dinesh Vishwakarma (2025). Online Shoppers Purchasing Intention Dataset [Dataset]. https://ieee-dataport.org/documents/online-shoppers-purchasing-intention-dataset
Explore at:
Dataset updated
Jan 9, 2025
Authors
Dinesh Vishwakarma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
84.5% (10
Evaluation Dataset for Chatbot/Virtual Assistants
kaggle.com
Updated Mar 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bitext (2022). Evaluation Dataset for Chatbot/Virtual Assistants [Dataset]. https://www.kaggle.com/datasets/bitext/evaluation-dataset-chatbot-virtual-assistants/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bitext
Description
Bitext Sample Pre-built Customer Service Evaluation Dataset for English

Overview

This Evaluation dataset contains example utterances taken from the "change order" intent from Bitext's pre-built Customer Service domain (which itself covers common intents present across Bitext's 20 pre-built domains). The data can be used to evaluate intent recognition models Natural Language Understanding (NLU) platforms.

Utterances

The dataset contains 10,000 utterances, extracted from a larger dataset of over 1,000,000 utterances, including language register variations such as politeness, colloquial, swearing, indirect style... To select the utterances, we use stratified sampling to generate a dataset with a general user language register profile.

The dataset also reflects commonly occurring linguistic phenomena of real-life chatbots, such as: - spelling mistakes - run-on words - missing punctuation

Contents

Each entry in the dataset contains an example utterance along with its corresponding intent, category and additional linguistic information. Each line contains the following four fields: - flags: the applicable linguistic flags - utterance: an example user utterance - category: the high-level intent category - intent: the intent corresponding to the user utterance

Linguistic flags

The dataset contains annotations for linguistic phenomena, which can be used to adapt bot training to different user language profiles. These flags are: B - Basic syntactic structure L - Lexical variation (synonyms) M - Morphological variation (plurals, tenses…) C - Complex/Coordinated syntactic structure E - Expanded abbreviations (I'm -> I am, I'd -> I would…) I - Interrogative structure K - Keyword only P - Politeness variation Q - Colloquial variation W - Offensive language Z - Noise (spelling, punctuation…)

These phenomena make the training dataset more effective and make bots more accurate and robust.

Categories and Intents

The intent categories covered by the dataset are: ORDER

The intents covered by the dataset are: change_order

(c) Bitext Innovations, 2022
s
Data and source code for "Automating Intention Mining"
researchdata.smu.edu.sg
zip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY (2023). Data and source code for "Automating Intention Mining" [Dataset]. http://doi.org/10.25440/smu.21261408.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25440/smu.21261408.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
The dataset and source code for paper "Automating Intention Mining".

The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.

By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.

Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.

Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
c
Intent, Entity, and Labelled Data List.docx
acquire.cqu.edu.au
docx
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kenneth Puspowidjono (2025). Intent, Entity, and Labelled Data List.docx [Dataset]. http://doi.org/10.25946/28578827.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25946/28578827.v1
Dataset updated
Apr 14, 2025
Dataset provided by
CQUniversity
Authors
Kenneth Puspowidjono
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The proposed research intends to improve the current service desk model by using Conversational Language Understanding (CLU) processes embedded in the chatbot model, to understand the user’s input and automate the ticket resolution process as well as improve the customer service experience and efficiency. The CLU data will be trained, thus it will be able to cover all the possible user input. The chatbot will then be designed to have five main dialogue flows consisting of, changing the user’s current password, checking the user’s mobile number that is listed in Azure Active Directory (AAD), updating the user’s mobile number in AAD, creating a new ticket to the ticketing system, and creating a follow-up ticket to the ticketing system. A trained CLU data with a high prediction score based on the proposed dialogue flow will then be embedded with the chatbot design. It would produce a next-level chatbot that is able to understand the user’s intent, classify the user’s intent, automate the user’s Level 1 (L1) proposed request without any human technician’s interaction, and create a ticket in the ticketing system for any request that is not covered by the chatbot yet.
i
VIS-iTrack Dataset for Visual Intention detection through Eye Gaze
ieee-dataport.org
Updated Feb 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ridwan Kabir (2022). VIS-iTrack Dataset for Visual Intention detection through Eye Gaze [Dataset]. https://ieee-dataport.org/documents/vis-itrack-dataset-visual-intention-detection-through-eye-gaze
Explore at:
Dataset updated
Feb 10, 2022
Authors
Ridwan Kabir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Fixation Count
Mental Health Conversational Data
kaggle.com
Updated Oct 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
elvis (2022). Mental Health Conversational Data [Dataset]. https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
elvis
Description
A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.

This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.

The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.
h
uci-shopper
huggingface.co
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Authors
John Henning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Online Shoppers Purchasing Intention Dataset

Dataset Summary

This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

NOTE: The information below is from the original dataset description from UCI's website.

Overview

Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.
A
‘Job Classification Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Job Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-job-classification-dataset-151c/03ce55a1/?iid=038-911&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Job Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/HRAnalyticRepository/job-classification-dataset on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Context

This is a dataset containing some fictional job class specs information. Typically job class specs have information which characterize the job class- its features, and a label- in this case a pay grade - something to predict that the features are related to.

Content

The data is a static snapshot. The contents are ID column - a sequential number Job Family ID Job Family Description Job Class ID Job Class Description PayGrade- numeric Education Level Experience Organizational Impact Problem Solving Supervision Contact Level Financial Budget PG- Alpha label for PayGrade

Acknowledgements

This data is purely fictional

Inspiration

The intent is to use machine learning classification algorithms to predict PG from Educational level through to Financial budget information.

Typically job classification in HR is time consuming and cumbersome as a manual activity. The intent is to show how machine learning and People Analytics can be brought to bear on this task.

--- Original source retains full ownership of the source dataset ---
Online Shoppers Intention
kaggle.com
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julio Ortega (2024). Online Shoppers Intention [Dataset]. https://www.kaggle.com/datasets/julioortegagimenez/online-shoppers-intention/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Julio Ortega
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The data was obtained from the following website: https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset Sakar,C. and Kastro,Yomi. (2018). Online Shoppers Purchasing Intention Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5F88Q.
f
Quantitative analysis of intent classification module.
figshare.com
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tulika Saha; Sriparna Saha; Pushpak Bhattacharyya (2023). Quantitative analysis of intent classification module. [Dataset]. http://doi.org/10.1371/journal.pone.0235367.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0235367.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Tulika Saha; Sriparna Saha; Pushpak Bhattacharyya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Quantitative analysis of intent classification module.
Multi-turn Prompts Dataset
kaggle.com
Updated Oct 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SoftAge.AI (2024). Multi-turn Prompts Dataset [Dataset]. https://www.kaggle.com/datasets/softageai/multi-turn-prompts-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
SoftAge.AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Description This dataset consists of 400 text-only fine-tuned versions of multi-turn conversations in the English language based on 10 categories and 19 use cases. It has been generated with ethically sourced human-in-the-loop data methods and aligned with supervised fine-tuning, direct preference optimization, and reinforcement learning through human feedback.

The human-annotated data is focused on data quality and precision to enhance the generative response of models used for AI chatbots, thereby improving their recall memory and recognition ability for continued assistance.

Key Features Prompts focused on user intent and were devised using natural language processing techniques. Multi-turn prompts with up to 5 turns to enhance responsive memory of large language models for pretraining. Conversational interactions for queries related to varied aspects of writing, coding, knowledge assistance, data manipulation, reasoning, and classification.

Dataset Source Subject matter expert annotators @SoftAgeAI have annotated the data at simple and complex levels, focusing on quality factors such as content accuracy, clarity, coherence, grammar, depth of information, and overall usefulness.

Structure & Fields The dataset is organized into different columns, which are detailed below:

P1, R1, P2, R2, P3, R3, P4, R4, P5 (object): These columns represent the sequence of prompts (P) and responses (R) within a single interaction. Each interaction can have up to 5 prompts and 5 corresponding responses, capturing the flow of a conversation. The prompts are user inputs, and the responses are the model's outputs. Use Case (object): Specifies the primary application or scenario for which the interaction is designed, such as "Q&A helper" or "Writing assistant." This classification helps in identifying the purpose of the dialogue. Type (object): Indicates the complexity of the interaction, with entries labeled as "Complex" in this dataset. This denotes that the dialogues involve more intricate and multi-layered exchanges. Category (object): Broadly categorizes the interaction type, such as "Open-ended QA" or "Writing." This provides context on the nature of the conversation, whether it is for generating creative content, providing detailed answers, or engaging in complex problem-solving. Intended Use Cases

The dataset can enhance query assistance model functioning related to shopping, coding, creative writing, travel assistance, marketing, citation, academic writing, language assistance, research topics, specialized knowledge, reasoning, and STEM-based. The dataset intends to aid generative models for e-commerce, customer assistance, marketing, education, suggestive user queries, and generic chatbots. It can pre-train large language models with supervision-based fine-tuned annotated data and for retrieval-augmented generative models. The dataset stands free of violence-based interactions that can lead to harm, conflict, discrimination, brutality, or misinformation. Potential Limitations & Biases This is a static dataset, so the information is dated May 2024.

Note If you have any questions related to our data annotation and human review services for large language model training and fine-tuning, please contact us at SoftAge Information Technology Limited at info@softage.ai.
Data from: Voxelized fragment dataset for machine learning
zenodo.org
investigacion.ujaen.es
csv, text/x-python +1
Updated Oct 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alfonso López Ruiz; Alfonso López Ruiz; Antonio Jesús Rueda Ruiz; Antonio Jesús Rueda Ruiz; Rafael Segura; Rafael Segura; Carlos Javier Ogayar Anguita; Carlos Javier Ogayar Anguita; Pablo Navarro; Pablo Navarro; José Manuel Fuertes García; José Manuel Fuertes García (2024). Voxelized fragment dataset for machine learning [Dataset]. http://doi.org/10.5281/zenodo.13899699
Explore at:
zip, csv, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13899699
Dataset updated
Oct 23, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alfonso López Ruiz; Alfonso López Ruiz; Antonio Jesús Rueda Ruiz; Antonio Jesús Rueda Ruiz; Rafael Segura; Rafael Segura; Carlos Javier Ogayar Anguita; Carlos Javier Ogayar Anguita; Pablo Navarro; Pablo Navarro; José Manuel Fuertes García; José Manuel Fuertes García
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Oct 2024
Description
One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. Hence, it is required to balance computational efficiency and realism for generating large datasets for marching learning.

We proposed a GPU-based fragmentation method to improve the baseline Discrete Voronoi Chain aimed at completing this dataset generation task. The dataset in this repository includes voxelized fragments from high-resolution 3D models, curated to be used as training sets for machine learning models. More specifically, these models come from an archaeological dataset, which led to more than 1M fragments from 1,052 Iberian vessels. In this dataset, fragments are not stored individually; instead, the fragmented voxelizations are provided in a compressed binary file (.rle.zip). Once uncompressed, each fragment is represented by a different number in the grid. The class to which each vessel belongs is also included in class.csv. The GPU-based pipeline that generated this dataset is explained at https://doi.org/10.1016/j.cag.2024.104104" target="_blank" rel="noreferrer noopener">https://doi.org/10.1016/j.cag.2024.104104.

Please, note that this dataset originally provided voxel data, point clouds and triangle meshes. However, we opted for including only voxel data because 1) the original dataset is too large to be uploaded to Zenodo and 2) the original intent of our paper is to generate implicit data in the form of voxels. If interested in the whole dataset (450GB), please visit the web page of our research institute.
h
snips
huggingface.co
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepPavlov (2024). snips [Dataset]. https://huggingface.co/datasets/DeepPavlov/snips
Explore at:
Dataset updated
Dec 2, 2024
Dataset authored and provided by
DeepPavlov
Description
snips

This is a text classification dataset. It is intended for machine learning research and experimentation. This dataset is obtained via formatting another publicly available data to be compatible with our AutoIntent Library.

Usage

It is intended to be used with our AutoIntent Library: from autointent import Dataset

snips = Dataset.from_hub("AutoIntent/snips")

Source

This dataset is taken from benayas/snips and formatted with our AutoIntent Library:… See the full description on the dataset page: https://huggingface.co/datasets/DeepPavlov/snips.
n
MultNIST Dataset
data.ncl.ac.uk
json
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough (2023). MultNIST Dataset [Dataset]. http://doi.org/10.25405/data.ncl.24574678.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.24574678.v1
Dataset updated
Nov 30, 2023
Dataset provided by
Newcastle University
Authors
David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset containing the images and labels for the MultNIST data used in the CVPR NAS workshop Unseen-data challenge under the codename "Mateo"The MultNIST dataset is a constructed dataset from MNIST Images. The intention of this dataset is to require machine learning models to do more than just image classification but also perform a calculation, in this case multiplaction followed by a mod operation. For each image, three MNIST Images were randomly chosen and combined together through the colour channels, resulting in a three colour-channel image so each MNIST image represents one colour channel. The data is in a channels-first format with a shape of (n, 3, 28, 28) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The label of each image is generated using the formula "(r * b * g) % 10" where r, g, and b are the red, green, and blue colour channels respectively. An example of a MultNIST Image would be a rgb configuation of 3, 7, and 4 respectively, which would result in a label of 4 ((3 * 7 * 4) % 10).
f
NTU Dataset
figshare.com
7z
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satyajit Neogi (2023). NTU Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7890764.v2
Explore at:
7zAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7890764.v2
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Satyajit Neogi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
**************** NTU Dataset ReadMe file *******************Please consider the latest version.Attached files contain our data collected inside Nanyang Technological University Campus for pedestrian intention prediction. The dataset is particularly designed to capture spontaneous vehicle influences on pedestrian crossing/not-crossing intention. We utilize this dataset in our paper "Context Model for Pedestrian Intention Prediction using Factored Latent-Dynamic Conditional Random Fields" submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.The dataset consists of 35 crossing and 35 stopping* (not-crossing) scenarios. The image sequences are in 'Image_sequences' folder. 'stopping_instants.csv' and 'crossing_instants.csv' files provide the stopping and crossing instants respectively, utilized for labeling the data and providing ground-truth for evaluation. Camera1 and Camera2 images are synchronized. Two cameras were used to capture the whole scene of interest.We provide pedestrian and vehicle bounding boxes obtained from [1]. The occlusions and mis-detections are linearly interpolated. All necessary detections are stored in 'Object_detector_pedestrians_vehicles' folder. Each column within the csv files ('car_bndbox_..') corresponds to a unique tracked car within each image sequence. Each of the pedestrian csv files ('ped_bndbox_..') contains only one column, as we consider each pedestrian in the scene separately. Additional details:* [xmin xmax ymin ymax] = left right top down* Dataset frequency: 15 fps.* Camera parameters (in pixels): f = 1135, principal point = (960, 540).Additionally, we provide semantic segmentation output [2] and our depth parameters. As the data were collected in two phases, there are two files in each folder, highlighting the sequences in each phase.Crossing sequences 1-28 and stopping sequences 1-24 were collected in Phase 1, while crossing sequences 29-35 and stopping sequences 25-35 were collected in Phase 2.We obtained the optical flow from [3]. Our model (FLDCRF and LSTM) codes are available in 'Models' folder.If you use our dataset in your research, please cite our paper:"S. Neogi, M. Hoy, W. Chaoqun, J. Dauwels, 'Context Based Pedestrian Intention Prediction Using Factored Latent Dynamic Conditional Random Fields', IEEE SSCI-2017."Please email us if you have any questions:1. Satyajit Neogi, PhD Student, Nanyang Technological University @ satyajit001@e.ntu.edu.sg 2. Justin Dauwels, Associate Professor, Nanyang Technological University @ jdauwels@ntu.edu.sgOur other group members include:3. Dr. Michael Hoy, @ mch.hoy@gmail.com4. Dr. Kang Dang, @ kangdang@gmail.com5. Ms. Lakshmi Prasanna Kachireddy, 6. Mr. Mok Bo Chuan Lance,7. Dr. Hang Yu, @ fhlyhv@gmail.comReferences:1. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS 2015.2. A. Kendall, V. Badrinarayanan, R. Cipolla,Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding", BMVC 2017.3. C. Liu. ``Beyond Pixels: Exploring New Representations and Applications for Motion Analysis". Doctoral Thesis. Massachusetts Institute of Technology. May 2009.* Please note, we had to remove sequence Stopping-33 for privacy reasons.
Triage System Comments with Priority and Labels
kaggle.com
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vamsi Pasam (2024). Triage System Comments with Priority and Labels [Dataset]. https://www.kaggle.com/datasets/vamsipasam2k/triaged-comments-with-priority-and-labels-hierarchy/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vamsi Pasam
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains 4,992 rows of structured information derived from a triage system designed for managing and prioritizing comments in collaborative environments. Using advanced machine learning techniques, such as GEMMA-2B for intent classification, Hugging Face models for sentiment analysis, and Latent Dirichlet Allocation (LDA) for topic modeling, each comment is analyzed across six dimensions: urgency, importance, sentiment, actionability, resolution status, and thematic relevance.

The dataset can support tasks in:

Natural Language Processing (NLP)

Hierarchical comment classification

Feedback prioritization strategies

Training machine learning models for dynamic comment management

Key Features: Hierarchical Labels: Multi-level classifications (level_0 to level_4) for each comment. Priority Scores: Aggregated values representing the criticality of each comment. Sentiment Analysis: Positive, neutral, and negative sentiment scoring. LDA Topics: Thematic insights for comment context.

Metadata: Rows: 4,992 Columns: 49 Tags: NLP, Machine Learning, Sentiment Analysis, Comment Triage, Topic Modeling, Collaboration

File Details: Filename: triaged_comments_with_priority_and_labels_hierarchy.csv License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Data from: A Dataset for Evaluating Blood Detection in Hyperspectral Images
zenodo.org
zip
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Romaszewski; Michał Romaszewski; Przemysław Głomb; Przemysław Głomb; Arkadiusz Sochan; Arkadiusz Sochan; Michał Cholewa; Michał Cholewa (2021). A Dataset for Evaluating Blood Detection in Hyperspectral Images [Dataset]. http://doi.org/10.5281/zenodo.3984905
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3984905
Dataset updated
Jul 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michał Romaszewski; Michał Romaszewski; Przemysław Głomb; Przemysław Głomb; Arkadiusz Sochan; Arkadiusz Sochan; Michał Cholewa; Michał Cholewa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The sensitivity of hyperspectral imaging (imaging spectroscopy) to haemoglobin derivatives makes it a promising tool for detection and classification of blood. However, due to complexity and high dimensionality of hyperspectral images, the development of hyperspectral blood detection algorithms is challenging. To facilitate their development, we present a new hyperspectral blood detection dataset. This dataset consists of 14 hyperspectral images (ENVI format) of a mock-up scene containing blood and visually similar substances (e.g. artificial blood or tomato concentrate). Images were taken over a period of three weeks and differ in terms of background composition and lighting intensity. To facilitate the use of data, the dataset includes an annotation of classes: pixels where blood and similar substances are visible have been marked by the authors. The main intention behind the dataset is to serve as testing data for Machine Learning methods for hyperspectral target detection and classification.
O
DL-HARD (Deep Learning Hard)
opendatalab.com
zip
Updated Sep 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Planck Institute for Informatics (2022). DL-HARD (Deep Learning Hard) [Dataset]. https://opendatalab.com/OpenDataLab/DL-HARD
Explore at:
zip(51413940 bytes)Available download formats
Dataset updated
Sep 22, 2022
Dataset provided by
University of Glasgow
Max Planck Institute for Informatics
Description
Deep Learning Hard (DL-HARD) is an annotated dataset designed to more effectively evaluate neural ranking models on complex topics. It builds on TREC Deep Learning (DL) questions extensively annotated with query intent categories, answer types, wikified entities, topic categories, and result type metadata from a leading web search engine. DL-HARD contains 50 queries from the official 2019/2020 evaluation benchmark, half of which are newly and independently assessed. Overall, DL-HARD is a new resource that promotes research on neural ranking methods by focusing on challenging and complex queries.
Z
Hand Washing Video Dataset Annotated According to the World Health...
data.niaid.nih.gov
zenodo.org
Updated Jan 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zemlanuhina, Olga (2022). Hand Washing Video Dataset Annotated According to the World Health Organization's Handwashing Guidelines - METC Subset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5808788
Explore at:
Dataset updated
Jan 3, 2022
Dataset provided by
Elsts, Atis
Ivanovs, Maksims
Vilde, Aija
Lulla, Martins
Sabelnikovs, Olegs
Rutkovskis, Aleksejs
Slavinska, Andreta
Melbārde-Kelmere, Agita
Zemlanuhina, Olga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview: This is a lab-based dataset with videos recording volunteers (medical students) washing their hands as part of a hand-washing monitoring and feedback experiment. The dataset is collected in the Medical Education Technology Center (METC) of Riga Stradins University, Riga, Latvia. In total, 72 participants took part in the experiments, each washing their hands three times, in a randomized order, going through three different hand-washing feedback approaches (user interfaces of a mobile app). The data was annotated in real time by a human operator, in order to give the experiment participants real-time feedback on their performance. There are 212 hand washing episodes in total, each of which is annotated by a single person. The annotations classify the washing movements according to the World Health Organization's (WHO) guidelines by marking each frame in each video with a certain movement code.

This dataset is part on three dataset series all following the same format:

https://zenodo.org/record/4537209 - data collected in Pauls Stradins Clinical University Hospital

https://zenodo.org/record/5808764 - data collected in Jurmala Hospital

https://zenodo.org/record/5808789 - data collected in the Medical Education Technology Center (METC) of Riga Stradins University

Note #1: we recommend that when using this dataset for machine learning, allowances are made for the reaction speed of the human operator labeling the data. For example, the annotations can be expected to be incorrect a short while after the person in the video switches their washing movements.

Application: The intention of this dataset is to serve as a basis for training machine learning classifiers for automated hand washing movement recognition and quality control.

Statistics:

Frame rate: ~16 FPS (slightly variable, as the video are reconstructed from a sequence of jpg images taken with max framerate supported by the capturing devices).

Resolution: 640x480

Number of videos: 212

Number of annotation files: 212

Movement codes (in JSON files):

1: Hand washing movement — Palm to palm

2: Hand washing movement — Palm over dorsum, fingers interlaced

3: Hand washing movement — Palm to palm, fingers interlaced

4: Hand washing movement — Backs of fingers to opposing palm, fingers interlocked

5: Hand washing movement — Rotational rubbing of the thumb

6: Hand washing movement — Fingertips to palm

0: Other hand washing movement

Note #2: The original dataset of JPG images is available upon request. There are 13 annotation classes in the original dataset: for each of the six washing movements defined by the WHO, "correct" and "incorrect" execution is market with two different labels. In this published dataset, all incorrect executions are marked with code 0, as "other" washing movement.

Acknowledgments: The dataset collection was funded by the Latvian Council of Science project: "Automated hand washing quality control and quality evaluation system with real-time feedback", No: lzp - Nr. 2020/2-0309.

References: For more detailed information, see this article, describing a similar dataset collected in a different project:

M. Lulla, A. Rutkovskis, A. Slavinska, A. Vilde, A. Gromova, M. Ivanovs, A. Skadins, R. Kadikis, A. Elsts. Hand-Washing Video Dataset Annotated According to the World Health Organization’s Hand-Washing Guidelines. Data. 2021; 6(4):38. https://doi.org/10.3390/data6040038

Contact information: atis.elsts@edi.lv
Assembly Shellcode Dataset
kaggle.com
Updated Dec 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Assembly Shellcode Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/assembly-shellcode-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Assembly Shellcode Dataset

The Largest Collection of Linux Assembly Shellcodes

By SoLID (From Huggingface) [source]

About this dataset

The dataset consists of multiple files for different purposes. The validation.csv file contains a set of carefully selected assembly shellcodes that serve the purpose of validation. These shellcodes are used to ensure the accuracy and integrity of any models or algorithms trained on this dataset.

The train.csv file contains both the intent column, which describes the purpose or objective behind each specific shellcode, and its corresponding assembly code snippets in order to facilitate supervised learning during training procedures. This file proves to be immensely valuable for researchers, practitioners, and developers seeking to study or develop effective techniques for dealing with malicious code analysis or security-related tasks.

For testing purposes, the test.csv file provides yet another collection of assembly shellcodes that can be employed as test cases to assess the performance, robustness, and generalization capability of various models or methodologies developed within this domain.

How to use the dataset

Understanding the Dataset

The dataset consists of multiple files that serve different purposes:

train.csv: This file contains the intent and corresponding assembly code snippets for training purposes. It can be used to train machine learning models or develop algorithms based on shellcode analysis.

test.csv: The test.csv file in the dataset contains a collection of assembly shellcodes specifically designed for testing purposes. You can use these shellcodes to evaluate and validate your models or analysis techniques.

validation.csv: The validation.csv file includes a set of assembly shellcodes that are specifically reserved for validation purposes. These shellcodes can be used separately to ensure the accuracy and reliability of your models.

Columns in the Dataset

The columns available in each CSV file are as follows:

intent: The intent column describes the purpose or objective of each specific shellcode entry. It provides information regarding what action or achievement is intended by using that particular piece of code.

snippet: The snippet column contains the actual assembly code corresponding to each intent entry in its respective row. It includes all necessary instructions and data required to execute the desired action specified by that intent.

Utilizing the Dataset

To effectively utilize this dataset, follow these general steps:

Familiarize yourself with assembly language: Assembly language is essential when working with shellcodes since they consist of low-level machine instructions understood by processors directly.

Explore intents: Start by analyzing and understanding different intents present in the dataset entries thoroughly. Each intent represents a specific goal or purpose behind creating an individual piece of code.

Examine snippets: Review the assembly code snippets corresponding to each intent entry. Carefully study the instructions and data used in the shellcode, as they directly influence their intended actions.

Train your models: If you are working on machine learning or algorithm development, utilize the train.csv file to train your models based on the labeled intent and snippet data provided. This step will enable you to build powerful tools for analyzing or detecting shellcodes automatically.

Evaluate using test datasets: Use the various assembly shellcodes present in test.csv to evaluate and validate your trained models or analysis techniques. This evaluation will help

Research Ideas

Malware analysis: The dataset can be used for studying and analyzing various shellcode techniques used in malware attacks. Researchers and security professionals can use this dataset to develop detection and prevention mechanisms against such attacks.

Penetration testing: Security experts can use this dataset to simulate real-world attack scenarios and test the effectiveness of their defensive measures. By having access to a diverse range of shellcodes, they can identify vulnerabilities in systems and patch them before malicious actors exploit them.

Machine learning training: This dataset can be used to train machine learning models for automatic detection or classification of shellcodes. By combining the intent column (which describes the objective of each shellcode) with the corresponding assembly code snippets, researchers can develop algorithms that automatically identify the purpose or ...

Facebook

Twitter

Click to copy link

Link copied

Cite

Dinesh Vishwakarma (2025). Online Shoppers Purchasing Intention Dataset [Dataset]. https://ieee-dataport.org/documents/online-shoppers-purchasing-intention-dataset

Online Shoppers Purchasing Intention Dataset

Explore at:

384 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 9, 2025

Authors

Dinesh Vishwakarma

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

84.5% (10

Clear search

Close search

Google apps

Main menu

Online Shoppers Purchasing Intention Dataset

Evaluation Dataset for Chatbot/Virtual Assistants

Bitext Sample Pre-built Customer Service Evaluation Dataset for English

Overview

Utterances

Contents

Linguistic flags

Categories and Intents

Data and source code for "Automating Intention Mining"

Intent, Entity, and Labelled Data List.docx

VIS-iTrack Dataset for Visual Intention detection through Eye Gaze

Mental Health Conversational Data

uci-shopper

‘Job Classification Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Online Shoppers Intention

Quantitative analysis of intent classification module.

Multi-turn Prompts Dataset

Data from: Voxelized fragment dataset for machine learning

snips

MultNIST Dataset

NTU Dataset

Triage System Comments with Priority and Labels

Data from: A Dataset for Evaluating Blood Detection in Hyperspectral Images

DL-HARD (Deep Learning Hard)

Hand Washing Video Dataset Annotated According to the World Health...

Assembly Shellcode Dataset

Assembly Shellcode Dataset

The Largest Collection of Linux Assembly Shellcodes

About this dataset

How to use the dataset

Understanding the Dataset

Columns in the Dataset

Utilizing the Dataset

Research Ideas

Online Shoppers Purchasing Intention Dataset