45 datasets found
  1. i

    Online Shoppers Purchasing Intention Dataset

    • ieee-dataport.org
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dinesh Vishwakarma (2025). Online Shoppers Purchasing Intention Dataset [Dataset]. https://ieee-dataport.org/documents/online-shoppers-purchasing-intention-dataset
    Explore at:
    Dataset updated
    Jan 9, 2025
    Authors
    Dinesh Vishwakarma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    84.5% (10

  2. Evaluation Dataset for Chatbot/Virtual Assistants

    • kaggle.com
    Updated Mar 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bitext (2022). Evaluation Dataset for Chatbot/Virtual Assistants [Dataset]. https://www.kaggle.com/datasets/bitext/evaluation-dataset-chatbot-virtual-assistants/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bitext
    Description

    Bitext Sample Pre-built Customer Service Evaluation Dataset for English

    Overview

    This Evaluation dataset contains example utterances taken from the "change order" intent from Bitext's pre-built Customer Service domain (which itself covers common intents present across Bitext's 20 pre-built domains). The data can be used to evaluate intent recognition models Natural Language Understanding (NLU) platforms.

    Utterances

    The dataset contains 10,000 utterances, extracted from a larger dataset of over 1,000,000 utterances, including language register variations such as politeness, colloquial, swearing, indirect style... To select the utterances, we use stratified sampling to generate a dataset with a general user language register profile.

    The dataset also reflects commonly occurring linguistic phenomena of real-life chatbots, such as: - spelling mistakes - run-on words - missing punctuation

    Contents

    Each entry in the dataset contains an example utterance along with its corresponding intent, category and additional linguistic information. Each line contains the following four fields: - flags: the applicable linguistic flags - utterance: an example user utterance - category: the high-level intent category - intent: the intent corresponding to the user utterance

    Linguistic flags

    The dataset contains annotations for linguistic phenomena, which can be used to adapt bot training to different user language profiles. These flags are: B - Basic syntactic structure L - Lexical variation (synonyms) M - Morphological variation (plurals, tenses…) C - Complex/Coordinated syntactic structure E - Expanded abbreviations (I'm -> I am, I'd -> I would…) I - Interrogative structure K - Keyword only P - Politeness variation Q - Colloquial variation W - Offensive language Z - Noise (spelling, punctuation…)

    These phenomena make the training dataset more effective and make bots more accurate and robust.

    Categories and Intents

    The intent categories covered by the dataset are: ORDER

    The intents covered by the dataset are: change_order

    (c) Bitext Innovations, 2022

  3. s

    Data and source code for "Automating Intention Mining"

    • researchdata.smu.edu.sg
    zip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY (2023). Data and source code for "Automating Intention Mining" [Dataset]. http://doi.org/10.25440/smu.21261408.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    Qiao HUANG; Xin XIA; David LO; Gail C. MURPHY
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    The dataset and source code for paper "Automating Intention Mining".

    The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.

    By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.

    Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.

    Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.

  4. c

    Intent, Entity, and Labelled Data List.docx

    • acquire.cqu.edu.au
    docx
    Updated Apr 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth Puspowidjono (2025). Intent, Entity, and Labelled Data List.docx [Dataset]. http://doi.org/10.25946/28578827.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 14, 2025
    Dataset provided by
    CQUniversity
    Authors
    Kenneth Puspowidjono
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The proposed research intends to improve the current service desk model by using Conversational Language Understanding (CLU) processes embedded in the chatbot model, to understand the user’s input and automate the ticket resolution process as well as improve the customer service experience and efficiency. The CLU data will be trained, thus it will be able to cover all the possible user input. The chatbot will then be designed to have five main dialogue flows consisting of, changing the user’s current password, checking the user’s mobile number that is listed in Azure Active Directory (AAD), updating the user’s mobile number in AAD, creating a new ticket to the ticketing system, and creating a follow-up ticket to the ticketing system. A trained CLU data with a high prediction score based on the proposed dialogue flow will then be embedded with the chatbot design. It would produce a next-level chatbot that is able to understand the user’s intent, classify the user’s intent, automate the user’s Level 1 (L1) proposed request without any human technician’s interaction, and create a ticket in the ticketing system for any request that is not covered by the chatbot yet.

  5. i

    VIS-iTrack Dataset for Visual Intention detection through Eye Gaze

    • ieee-dataport.org
    Updated Feb 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ridwan Kabir (2022). VIS-iTrack Dataset for Visual Intention detection through Eye Gaze [Dataset]. https://ieee-dataport.org/documents/vis-itrack-dataset-visual-intention-detection-through-eye-gaze
    Explore at:
    Dataset updated
    Feb 10, 2022
    Authors
    Ridwan Kabir
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fixation Count

  6. Mental Health Conversational Data

    • kaggle.com
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    elvis (2022). Mental Health Conversational Data [Dataset]. https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    elvis
    Description

    A dataset containing basic conversations, mental health FAQ, classical therapy conversations, and general advice provided to people suffering from anxiety and depression.

    This dataset can be used to train a model for a chatbot that can behave like a therapist in order to provide emotional support to people with anxiety & depression.

    The dataset contains intents. An “intent” is the intention behind a user's message. For instance, If I were to say “I am sad” to the chatbot, the intent, in this case, would be “sad”. Depending upon the intent, there is a set of Patterns and Responses appropriate for the intent. Patterns are some examples of a user’s message which aligns with the intent while Responses are the replies that the chatbot provides in accordance with the intent. Various intents are defined and their patterns and responses are used as the model’s training data to identify a particular intent.

  7. h

    uci-shopper

    • huggingface.co
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Henning (2023). uci-shopper [Dataset]. https://huggingface.co/datasets/jlh/uci-shopper
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Authors
    John Henning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for Online Shoppers Purchasing Intention Dataset

      Dataset Summary
    

    This dataset is a reupload of the Online Shoppers Purchasing Intention Dataset from the UCI Machine Learning Repository.

    NOTE: The information below is from the original dataset description from UCI's website.

      Overview
    

    Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples… See the full description on the dataset page: https://huggingface.co/datasets/jlh/uci-shopper.

  8. A

    ‘Job Classification Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Job Classification Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-job-classification-dataset-151c/03ce55a1/?iid=038-911&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Job Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/HRAnalyticRepository/job-classification-dataset on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    This is a dataset containing some fictional job class specs information. Typically job class specs have information which characterize the job class- its features, and a label- in this case a pay grade - something to predict that the features are related to.

    Content

    The data is a static snapshot. The contents are ID column - a sequential number Job Family ID Job Family Description Job Class ID Job Class Description PayGrade- numeric Education Level Experience Organizational Impact Problem Solving Supervision Contact Level Financial Budget PG- Alpha label for PayGrade

    Acknowledgements

    This data is purely fictional

    Inspiration

    The intent is to use machine learning classification algorithms to predict PG from Educational level through to Financial budget information.

    Typically job classification in HR is time consuming and cumbersome as a manual activity. The intent is to show how machine learning and People Analytics can be brought to bear on this task.

    --- Original source retains full ownership of the source dataset ---

  9. Online Shoppers Intention

    • kaggle.com
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julio Ortega (2024). Online Shoppers Intention [Dataset]. https://www.kaggle.com/datasets/julioortegagimenez/online-shoppers-intention/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Julio Ortega
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The data was obtained from the following website: https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset Sakar,C. and Kastro,Yomi. (2018). Online Shoppers Purchasing Intention Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5F88Q.

  10. f

    Quantitative analysis of intent classification module.

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tulika Saha; Sriparna Saha; Pushpak Bhattacharyya (2023). Quantitative analysis of intent classification module. [Dataset]. http://doi.org/10.1371/journal.pone.0235367.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Tulika Saha; Sriparna Saha; Pushpak Bhattacharyya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Quantitative analysis of intent classification module.

  11. Multi-turn Prompts Dataset

    • kaggle.com
    Updated Oct 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SoftAge.AI (2024). Multi-turn Prompts Dataset [Dataset]. https://www.kaggle.com/datasets/softageai/multi-turn-prompts-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SoftAge.AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Description This dataset consists of 400 text-only fine-tuned versions of multi-turn conversations in the English language based on 10 categories and 19 use cases. It has been generated with ethically sourced human-in-the-loop data methods and aligned with supervised fine-tuning, direct preference optimization, and reinforcement learning through human feedback.

    The human-annotated data is focused on data quality and precision to enhance the generative response of models used for AI chatbots, thereby improving their recall memory and recognition ability for continued assistance.

    Key Features Prompts focused on user intent and were devised using natural language processing techniques. Multi-turn prompts with up to 5 turns to enhance responsive memory of large language models for pretraining. Conversational interactions for queries related to varied aspects of writing, coding, knowledge assistance, data manipulation, reasoning, and classification.

    Dataset Source Subject matter expert annotators @SoftAgeAI have annotated the data at simple and complex levels, focusing on quality factors such as content accuracy, clarity, coherence, grammar, depth of information, and overall usefulness.

    Structure & Fields The dataset is organized into different columns, which are detailed below:

    P1, R1, P2, R2, P3, R3, P4, R4, P5 (object): These columns represent the sequence of prompts (P) and responses (R) within a single interaction. Each interaction can have up to 5 prompts and 5 corresponding responses, capturing the flow of a conversation. The prompts are user inputs, and the responses are the model's outputs. Use Case (object): Specifies the primary application or scenario for which the interaction is designed, such as "Q&A helper" or "Writing assistant." This classification helps in identifying the purpose of the dialogue. Type (object): Indicates the complexity of the interaction, with entries labeled as "Complex" in this dataset. This denotes that the dialogues involve more intricate and multi-layered exchanges. Category (object): Broadly categorizes the interaction type, such as "Open-ended QA" or "Writing." This provides context on the nature of the conversation, whether it is for generating creative content, providing detailed answers, or engaging in complex problem-solving. Intended Use Cases

    The dataset can enhance query assistance model functioning related to shopping, coding, creative writing, travel assistance, marketing, citation, academic writing, language assistance, research topics, specialized knowledge, reasoning, and STEM-based. The dataset intends to aid generative models for e-commerce, customer assistance, marketing, education, suggestive user queries, and generic chatbots. It can pre-train large language models with supervision-based fine-tuned annotated data and for retrieval-augmented generative models. The dataset stands free of violence-based interactions that can lead to harm, conflict, discrimination, brutality, or misinformation. Potential Limitations & Biases This is a static dataset, so the information is dated May 2024.

    Note If you have any questions related to our data annotation and human review services for large language model training and fine-tuning, please contact us at SoftAge Information Technology Limited at info@softage.ai.

  12. Data from: Voxelized fragment dataset for machine learning

    • zenodo.org
    • investigacion.ujaen.es
    csv, text/x-python +1
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfonso López Ruiz; Alfonso López Ruiz; Antonio Jesús Rueda Ruiz; Antonio Jesús Rueda Ruiz; Rafael Segura; Rafael Segura; Carlos Javier Ogayar Anguita; Carlos Javier Ogayar Anguita; Pablo Navarro; Pablo Navarro; José Manuel Fuertes García; José Manuel Fuertes García (2024). Voxelized fragment dataset for machine learning [Dataset]. http://doi.org/10.5281/zenodo.13899699
    Explore at:
    zip, csv, text/x-pythonAvailable download formats
    Dataset updated
    Oct 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alfonso López Ruiz; Alfonso López Ruiz; Antonio Jesús Rueda Ruiz; Antonio Jesús Rueda Ruiz; Rafael Segura; Rafael Segura; Carlos Javier Ogayar Anguita; Carlos Javier Ogayar Anguita; Pablo Navarro; Pablo Navarro; José Manuel Fuertes García; José Manuel Fuertes García
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Oct 2024
    Description

    One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. Hence, it is required to balance computational efficiency and realism for generating large datasets for marching learning.

    We proposed a GPU-based fragmentation method to improve the baseline Discrete Voronoi Chain aimed at completing this dataset generation task. The dataset in this repository includes voxelized fragments from high-resolution 3D models, curated to be used as training sets for machine learning models. More specifically, these models come from an archaeological dataset, which led to more than 1M fragments from 1,052 Iberian vessels. In this dataset, fragments are not stored individually; instead, the fragmented voxelizations are provided in a compressed binary file (.rle.zip). Once uncompressed, each fragment is represented by a different number in the grid. The class to which each vessel belongs is also included in class.csv. The GPU-based pipeline that generated this dataset is explained at https://doi.org/10.1016/j.cag.2024.104104" target="_blank" rel="noreferrer noopener">https://doi.org/10.1016/j.cag.2024.104104.

    Please, note that this dataset originally provided voxel data, point clouds and triangle meshes. However, we opted for including only voxel data because 1) the original dataset is too large to be uploaded to Zenodo and 2) the original intent of our paper is to generate implicit data in the form of voxels. If interested in the whole dataset (450GB), please visit the web page of our research institute.

  13. h

    snips

    • huggingface.co
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepPavlov (2024). snips [Dataset]. https://huggingface.co/datasets/DeepPavlov/snips
    Explore at:
    Dataset updated
    Dec 2, 2024
    Dataset authored and provided by
    DeepPavlov
    Description

    snips

    This is a text classification dataset. It is intended for machine learning research and experimentation. This dataset is obtained via formatting another publicly available data to be compatible with our AutoIntent Library.

      Usage
    

    It is intended to be used with our AutoIntent Library: from autointent import Dataset

    snips = Dataset.from_hub("AutoIntent/snips")

      Source
    

    This dataset is taken from benayas/snips and formatted with our AutoIntent Library:… See the full description on the dataset page: https://huggingface.co/datasets/DeepPavlov/snips.

  14. n

    MultNIST Dataset

    • data.ncl.ac.uk
    json
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough (2023). MultNIST Dataset [Dataset]. http://doi.org/10.25405/data.ncl.24574678.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Newcastle University
    Authors
    David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset containing the images and labels for the MultNIST data used in the CVPR NAS workshop Unseen-data challenge under the codename "Mateo"The MultNIST dataset is a constructed dataset from MNIST Images. The intention of this dataset is to require machine learning models to do more than just image classification but also perform a calculation, in this case multiplaction followed by a mod operation. For each image, three MNIST Images were randomly chosen and combined together through the colour channels, resulting in a three colour-channel image so each MNIST image represents one colour channel. The data is in a channels-first format with a shape of (n, 3, 28, 28) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The label of each image is generated using the formula "(r * b * g) % 10" where r, g, and b are the red, green, and blue colour channels respectively. An example of a MultNIST Image would be a rgb configuation of 3, 7, and 4 respectively, which would result in a label of 4 ((3 * 7 * 4) % 10).

  15. f

    NTU Dataset

    • figshare.com
    7z
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satyajit Neogi (2023). NTU Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7890764.v2
    Explore at:
    7zAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Satyajit Neogi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    **************** NTU Dataset ReadMe file *******************Please consider the latest version.Attached files contain our data collected inside Nanyang Technological University Campus for pedestrian intention prediction. The dataset is particularly designed to capture spontaneous vehicle influences on pedestrian crossing/not-crossing intention. We utilize this dataset in our paper "Context Model for Pedestrian Intention Prediction using Factored Latent-Dynamic Conditional Random Fields" submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence.The dataset consists of 35 crossing and 35 stopping* (not-crossing) scenarios. The image sequences are in 'Image_sequences' folder. 'stopping_instants.csv' and 'crossing_instants.csv' files provide the stopping and crossing instants respectively, utilized for labeling the data and providing ground-truth for evaluation. Camera1 and Camera2 images are synchronized. Two cameras were used to capture the whole scene of interest.We provide pedestrian and vehicle bounding boxes obtained from [1]. The occlusions and mis-detections are linearly interpolated. All necessary detections are stored in 'Object_detector_pedestrians_vehicles' folder. Each column within the csv files ('car_bndbox_..') corresponds to a unique tracked car within each image sequence. Each of the pedestrian csv files ('ped_bndbox_..') contains only one column, as we consider each pedestrian in the scene separately. Additional details:* [xmin xmax ymin ymax] = left right top down* Dataset frequency: 15 fps.* Camera parameters (in pixels): f = 1135, principal point = (960, 540).Additionally, we provide semantic segmentation output [2] and our depth parameters. As the data were collected in two phases, there are two files in each folder, highlighting the sequences in each phase.Crossing sequences 1-28 and stopping sequences 1-24 were collected in Phase 1, while crossing sequences 29-35 and stopping sequences 25-35 were collected in Phase 2.We obtained the optical flow from [3]. Our model (FLDCRF and LSTM) codes are available in 'Models' folder.If you use our dataset in your research, please cite our paper:"S. Neogi, M. Hoy, W. Chaoqun, J. Dauwels, 'Context Based Pedestrian Intention Prediction Using Factored Latent Dynamic Conditional Random Fields', IEEE SSCI-2017."Please email us if you have any questions:1. Satyajit Neogi, PhD Student, Nanyang Technological University @ satyajit001@e.ntu.edu.sg 2. Justin Dauwels, Associate Professor, Nanyang Technological University @ jdauwels@ntu.edu.sgOur other group members include:3. Dr. Michael Hoy, @ mch.hoy@gmail.com4. Dr. Kang Dang, @ kangdang@gmail.com5. Ms. Lakshmi Prasanna Kachireddy, 6. Mr. Mok Bo Chuan Lance,7. Dr. Hang Yu, @ fhlyhv@gmail.comReferences:1. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", NIPS 2015.2. A. Kendall, V. Badrinarayanan, R. Cipolla,Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding", BMVC 2017.3. C. Liu. ``Beyond Pixels: Exploring New Representations and Applications for Motion Analysis". Doctoral Thesis. Massachusetts Institute of Technology. May 2009.* Please note, we had to remove sequence Stopping-33 for privacy reasons.

  16. Triage System Comments with Priority and Labels

    • kaggle.com
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vamsi Pasam (2024). Triage System Comments with Priority and Labels [Dataset]. https://www.kaggle.com/datasets/vamsipasam2k/triaged-comments-with-priority-and-labels-hierarchy/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vamsi Pasam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains 4,992 rows of structured information derived from a triage system designed for managing and prioritizing comments in collaborative environments. Using advanced machine learning techniques, such as GEMMA-2B for intent classification, Hugging Face models for sentiment analysis, and Latent Dirichlet Allocation (LDA) for topic modeling, each comment is analyzed across six dimensions: urgency, importance, sentiment, actionability, resolution status, and thematic relevance.

    The dataset can support tasks in:

    • Natural Language Processing (NLP)
    • Hierarchical comment classification
    • Feedback prioritization strategies
    • Training machine learning models for dynamic comment management

    Key Features: Hierarchical Labels: Multi-level classifications (level_0 to level_4) for each comment. Priority Scores: Aggregated values representing the criticality of each comment. Sentiment Analysis: Positive, neutral, and negative sentiment scoring. LDA Topics: Thematic insights for comment context.

    Metadata: Rows: 4,992 Columns: 49 Tags: NLP, Machine Learning, Sentiment Analysis, Comment Triage, Topic Modeling, Collaboration

    File Details: Filename: triaged_comments_with_priority_and_labels_hierarchy.csv License: Creative Commons Attribution 4.0 International (CC BY 4.0)

  17. Data from: A Dataset for Evaluating Blood Detection in Hyperspectral Images

    • zenodo.org
    zip
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Romaszewski; Michał Romaszewski; Przemysław Głomb; Przemysław Głomb; Arkadiusz Sochan; Arkadiusz Sochan; Michał Cholewa; Michał Cholewa (2021). A Dataset for Evaluating Blood Detection in Hyperspectral Images [Dataset]. http://doi.org/10.5281/zenodo.3984905
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michał Romaszewski; Michał Romaszewski; Przemysław Głomb; Przemysław Głomb; Arkadiusz Sochan; Arkadiusz Sochan; Michał Cholewa; Michał Cholewa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The sensitivity of hyperspectral imaging (imaging spectroscopy) to haemoglobin derivatives makes it a promising tool for detection and classification of blood. However, due to complexity and high dimensionality of hyperspectral images, the development of hyperspectral blood detection algorithms is challenging. To facilitate their development, we present a new hyperspectral blood detection dataset. This dataset consists of 14 hyperspectral images (ENVI format) of a mock-up scene containing blood and visually similar substances (e.g. artificial blood or tomato concentrate). Images were taken over a period of three weeks and differ in terms of background composition and lighting intensity. To facilitate the use of data, the dataset includes an annotation of classes: pixels where blood and similar substances are visible have been marked by the authors. The main intention behind the dataset is to serve as testing data for Machine Learning methods for hyperspectral target detection and classification.

  18. O

    DL-HARD (Deep Learning Hard)

    • opendatalab.com
    zip
    Updated Sep 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Planck Institute for Informatics (2022). DL-HARD (Deep Learning Hard) [Dataset]. https://opendatalab.com/OpenDataLab/DL-HARD
    Explore at:
    zip(51413940 bytes)Available download formats
    Dataset updated
    Sep 22, 2022
    Dataset provided by
    University of Glasgow
    Max Planck Institute for Informatics
    Description

    Deep Learning Hard (DL-HARD) is an annotated dataset designed to more effectively evaluate neural ranking models on complex topics. It builds on TREC Deep Learning (DL) questions extensively annotated with query intent categories, answer types, wikified entities, topic categories, and result type metadata from a leading web search engine. DL-HARD contains 50 queries from the official 2019/2020 evaluation benchmark, half of which are newly and independently assessed. Overall, DL-HARD is a new resource that promotes research on neural ranking methods by focusing on challenging and complex queries.

  19. Z

    Hand Washing Video Dataset Annotated According to the World Health...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zemlanuhina, Olga (2022). Hand Washing Video Dataset Annotated According to the World Health Organization's Handwashing Guidelines - METC Subset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5808788
    Explore at:
    Dataset updated
    Jan 3, 2022
    Dataset provided by
    Elsts, Atis
    Ivanovs, Maksims
    Vilde, Aija
    Lulla, Martins
    Sabelnikovs, Olegs
    Rutkovskis, Aleksejs
    Slavinska, Andreta
    Melbārde-Kelmere, Agita
    Zemlanuhina, Olga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview: This is a lab-based dataset with videos recording volunteers (medical students) washing their hands as part of a hand-washing monitoring and feedback experiment. The dataset is collected in the Medical Education Technology Center (METC) of Riga Stradins University, Riga, Latvia. In total, 72 participants took part in the experiments, each washing their hands three times, in a randomized order, going through three different hand-washing feedback approaches (user interfaces of a mobile app). The data was annotated in real time by a human operator, in order to give the experiment participants real-time feedback on their performance. There are 212 hand washing episodes in total, each of which is annotated by a single person. The annotations classify the washing movements according to the World Health Organization's (WHO) guidelines by marking each frame in each video with a certain movement code.

    This dataset is part on three dataset series all following the same format:

    https://zenodo.org/record/4537209 - data collected in Pauls Stradins Clinical University Hospital

    https://zenodo.org/record/5808764 - data collected in Jurmala Hospital

    https://zenodo.org/record/5808789 - data collected in the Medical Education Technology Center (METC) of Riga Stradins University

    Note #1: we recommend that when using this dataset for machine learning, allowances are made for the reaction speed of the human operator labeling the data. For example, the annotations can be expected to be incorrect a short while after the person in the video switches their washing movements.

    Application: The intention of this dataset is to serve as a basis for training machine learning classifiers for automated hand washing movement recognition and quality control.

    Statistics:

    Frame rate: ~16 FPS (slightly variable, as the video are reconstructed from a sequence of jpg images taken with max framerate supported by the capturing devices).

    Resolution: 640x480

    Number of videos: 212

    Number of annotation files: 212

    Movement codes (in JSON files):

    1: Hand washing movement — Palm to palm

    2: Hand washing movement — Palm over dorsum, fingers interlaced

    3: Hand washing movement — Palm to palm, fingers interlaced

    4: Hand washing movement — Backs of fingers to opposing palm, fingers interlocked

    5: Hand washing movement — Rotational rubbing of the thumb

    6: Hand washing movement — Fingertips to palm

    0: Other hand washing movement

    Note #2: The original dataset of JPG images is available upon request. There are 13 annotation classes in the original dataset: for each of the six washing movements defined by the WHO, "correct" and "incorrect" execution is market with two different labels. In this published dataset, all incorrect executions are marked with code 0, as "other" washing movement.

    Acknowledgments: The dataset collection was funded by the Latvian Council of Science project: "Automated hand washing quality control and quality evaluation system with real-time feedback", No: lzp - Nr. 2020/2-0309.

    References: For more detailed information, see this article, describing a similar dataset collected in a different project:

    M. Lulla, A. Rutkovskis, A. Slavinska, A. Vilde, A. Gromova, M. Ivanovs, A. Skadins, R. Kadikis, A. Elsts. Hand-Washing Video Dataset Annotated According to the World Health Organization’s Hand-Washing Guidelines. Data. 2021; 6(4):38. https://doi.org/10.3390/data6040038

    Contact information: atis.elsts@edi.lv

  20. Assembly Shellcode Dataset

    • kaggle.com
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Assembly Shellcode Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/assembly-shellcode-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Assembly Shellcode Dataset

    The Largest Collection of Linux Assembly Shellcodes

    By SoLID (From Huggingface) [source]

    About this dataset

    The dataset consists of multiple files for different purposes. The validation.csv file contains a set of carefully selected assembly shellcodes that serve the purpose of validation. These shellcodes are used to ensure the accuracy and integrity of any models or algorithms trained on this dataset.

    The train.csv file contains both the intent column, which describes the purpose or objective behind each specific shellcode, and its corresponding assembly code snippets in order to facilitate supervised learning during training procedures. This file proves to be immensely valuable for researchers, practitioners, and developers seeking to study or develop effective techniques for dealing with malicious code analysis or security-related tasks.

    For testing purposes, the test.csv file provides yet another collection of assembly shellcodes that can be employed as test cases to assess the performance, robustness, and generalization capability of various models or methodologies developed within this domain.

    How to use the dataset

    Understanding the Dataset

    The dataset consists of multiple files that serve different purposes:

    • train.csv: This file contains the intent and corresponding assembly code snippets for training purposes. It can be used to train machine learning models or develop algorithms based on shellcode analysis.

    • test.csv: The test.csv file in the dataset contains a collection of assembly shellcodes specifically designed for testing purposes. You can use these shellcodes to evaluate and validate your models or analysis techniques.

    • validation.csv: The validation.csv file includes a set of assembly shellcodes that are specifically reserved for validation purposes. These shellcodes can be used separately to ensure the accuracy and reliability of your models.

    Columns in the Dataset

    The columns available in each CSV file are as follows:

    • intent: The intent column describes the purpose or objective of each specific shellcode entry. It provides information regarding what action or achievement is intended by using that particular piece of code.

    • snippet: The snippet column contains the actual assembly code corresponding to each intent entry in its respective row. It includes all necessary instructions and data required to execute the desired action specified by that intent.

    Utilizing the Dataset

    To effectively utilize this dataset, follow these general steps:

    • Familiarize yourself with assembly language: Assembly language is essential when working with shellcodes since they consist of low-level machine instructions understood by processors directly.

    • Explore intents: Start by analyzing and understanding different intents present in the dataset entries thoroughly. Each intent represents a specific goal or purpose behind creating an individual piece of code.

    • Examine snippets: Review the assembly code snippets corresponding to each intent entry. Carefully study the instructions and data used in the shellcode, as they directly influence their intended actions.

    • Train your models: If you are working on machine learning or algorithm development, utilize the train.csv file to train your models based on the labeled intent and snippet data provided. This step will enable you to build powerful tools for analyzing or detecting shellcodes automatically.

    • Evaluate using test datasets: Use the various assembly shellcodes present in test.csv to evaluate and validate your trained models or analysis techniques. This evaluation will help

    Research Ideas

    • Malware analysis: The dataset can be used for studying and analyzing various shellcode techniques used in malware attacks. Researchers and security professionals can use this dataset to develop detection and prevention mechanisms against such attacks.
    • Penetration testing: Security experts can use this dataset to simulate real-world attack scenarios and test the effectiveness of their defensive measures. By having access to a diverse range of shellcodes, they can identify vulnerabilities in systems and patch them before malicious actors exploit them.
    • Machine learning training: This dataset can be used to train machine learning models for automatic detection or classification of shellcodes. By combining the intent column (which describes the objective of each shellcode) with the corresponding assembly code snippets, researchers can develop algorithms that automatically identify the purpose or ...
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dinesh Vishwakarma (2025). Online Shoppers Purchasing Intention Dataset [Dataset]. https://ieee-dataport.org/documents/online-shoppers-purchasing-intention-dataset

Online Shoppers Purchasing Intention Dataset

Explore at:
384 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 9, 2025
Authors
Dinesh Vishwakarma
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

84.5% (10

Search
Clear search
Close search
Google apps
Main menu