97 datasets found
  1. IMDB Dataset For Machine Learning

    • kaggle.com
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KHUSHI YADAV (2023). IMDB Dataset For Machine Learning [Dataset]. https://www.kaggle.com/datasets/khushiyadav2022/imdb-dataset-for-machine-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    KHUSHI YADAV
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    "Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.

    The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.

  2. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  3. Deep learning term project

    • kaggle.com
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sampreeth R S (2024). Deep learning term project [Dataset]. https://www.kaggle.com/sampreethrs/deep-learning-term-project/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sampreeth R S
    Description

    Dataset

    This dataset was created by Sampreeth R S

    Contents

  4. buds-lab/building-data-genome-project-2: v1.0

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers (2020). buds-lab/building-data-genome-project-2: v1.0 [Dataset]. http://doi.org/10.5281/zenodo.3887306
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BDG2 open data set consists of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters are collected from 19 sites across North America and Europe, and they measure electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data was used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. This subset includes data from 2,380 meters from 1,448 buildings that were used in the GEPIII, a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

  5. Machine Learning Projects.

    • kaggle.com
    Updated Oct 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avinash Shan Monteiro (2020). Machine Learning Projects. [Dataset]. https://www.kaggle.com/avinashshanmonteiro/machine-learning-porjects/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avinash Shan Monteiro
    Description

    Dataset

    This dataset was created by Avinash Shan Monteiro

    Released under Data files © Original Authors

    Contents

  6. Machine Learning End-to-End Projects

    • kaggle.com
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vamsi kamatham (2023). Machine Learning End-to-End Projects [Dataset]. https://www.kaggle.com/datasets/vamsikrishnakamatham/end-to-end-machine-learning-projects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vamsi kamatham
    Description

    Dataset

    This dataset was created by vamsi kamatham

    Contents

  7. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  8. Personal skills dataset

    • kaggle.com
    Updated Oct 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanan Abbas (2020). Personal skills dataset [Dataset]. https://www.kaggle.com/hanaanabbas/personal-skills-dataset/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2020
    Dataset provided by
    Kaggle
    Authors
    Hanan Abbas
    Description

    Dataset

    This dataset was created by Hanan Abbas

    Contents

  9. f

    Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image...

    • figshare.com
    application/x-rar
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Elbattah (2023). Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7073087.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Mahmoud Elbattah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide a dataset that includes visualizations of eye-tracking scanpaths with a particular focus Autism Spectrum Disorder (ASD). The key idea is to transform the dynamics of eye motion into visual patterns, and hence diagnosis-related tasks could be approached using image analysis techniques. The image dataset is publicly available to be used by other studies aiming to experiment the usability of eye-tracking within the ASD context. It is believed that the dataset can allow for the development of further interesting applications using Machine Learning or image processing techniques. For more info, please refer to the publication below and the project website.Original Publication:Carette, R., Elbattah, M., Dequen, G., Guérin, J, & Cilia, F. (2019, February). Learning to predict autism spectrum disorder based on the visual patterns of eye-tracking scanpaths. In Proceedings of the 12th International Conference on Health Informatics (HEALTHINF 2019).Project Website:https://www.researchgate.net/project/Predicting-Autism-Spectrum-Disorder-Using-Machine-Learning-and-Eye-Trackinghttps://mahmoud-elbattah.github.io/ML4Autism/

  10. Datasets for Crime Hotspot Prediction Project

    • figshare.com
    application/x-rar
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tuğrul Cabir Hakyemez; Bertan Badur (2023). Datasets for Crime Hotspot Prediction Project [Dataset]. http://doi.org/10.6084/m9.figshare.24171672.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tuğrul Cabir Hakyemez; Bertan Badur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets: This repository includes a folder titled Datasets that contains all crime and park event tables. These tables consist of event observations across street segments (columns) with time steps (rows). This is a crucial aspect of the project. Additionally, the folder contains Python pickles that store network information, street segment details, and crime location information. Each file in this folder is required by the script to generate predictions.Predictions: The folder also contains predictions across all the models (e.g., theft daily, robbery shift). These files have two columns: actual and predicted values, respectively. The structure is 2459 rows by test days. This means that the first 2459 rows represent predictions for each segment on the first day, the next 2459 rows represent predictions for the second day, and so on.

  11. Female facial image dataset

    • kaggle.com
    Updated Apr 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alpha Lu (2019). Female facial image dataset [Dataset]. https://www.kaggle.com/urnotalphalu711/female-facial-image-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alpha Lu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Alpha Lu

    Released under CC0: Public Domain

    Contents

  12. P

    Materials Project Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jain (2023). Materials Project Dataset [Dataset]. https://paperswithcode.com/dataset/materials-project
    Explore at:
    Authors
    Jain
    Description

    The Materials Project is a collection of chemical compounds labelled with different attributes. The labelling is performed by different simulations, most of them at DFT level of theory.

    The dataset links:

    MP 2018.6.1 (69,239 materials) MP 2019.4.1 (133,420 materials)

  13. deep_learning_abracadata

    • kaggle.com
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    khmiri iheb (2024). deep_learning_abracadata [Dataset]. https://www.kaggle.com/datasets/khmiriiheb/deep-learning-abracadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    khmiri iheb
    Description

    Dataset

    This dataset was created by khmiri iheb

    Contents

  14. o

    Tweet Emotion Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Tweet Emotion Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/06728d6d-f212-4aed-a1e1-ba02c5b16cf5
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset is a collection of microblog entries, such as tweets, paired with their corresponding sentiment or emotional labels. It serves as a valuable resource for developing and testing artificial intelligence models capable of predicting emotions from textual content. The dataset's purpose is to provide rich, real-world examples of text and associated human sentiments, making it ideal for tasks like sentiment analysis and emotion detection in Natural Language Processing (NLP) systems.

    Columns

    • tweet_id: A unique numerical identifier for each individual microblog entry.
    • sentiment: The categorised emotion or mood associated with the content. Examples of sentiments found include 'empty', 'sadness', 'enthusiasm', 'neutral', 'worry', 'love', 'fun', and 'surprise'.
    • content: The actual text of the microblog entry.

    Distribution

    The dataset is structured in a tabular format, typically stored as a CSV file. Specific numbers for rows or records are not available in the provided details, but a sample of the data showcases its structure. Further details on the complete file size and record count would be updated separately.

    Usage

    This dataset is ideally suited for various applications, including: * Sentiment Analysis: Training machine learning models to identify the emotional tone of text. * Emotion Detection: Building AI software capable of predicting specific emotions from written content. * Natural Language Processing (NLP) Research: Exploring and developing new algorithms for text understanding and classification. * Academic Projects and Theses: Providing a practical dataset for research and development in text-based AI. * Social Media Monitoring: Analysing public sentiment on various topics based on microblog data.

    Coverage

    The dataset primarily covers textual content from microblogs. While a specific geographic region is not stated for the collected tweets, such data is typically global in nature. No explicit time range for the original data collection is provided. The demographic scope is broad, reflecting general microblog users, with no specific notes on availability for certain groups or years.

    License

    CCO

    Who Can Use It

    This dataset is intended for a wide range of users, including: * Data Scientists: For building and evaluating sentiment and emotion prediction models. * Machine Learning Engineers: For training and fine-tuning text classification algorithms. * Academic Researchers and Students: For use in theses, projects, and scientific studies related to NLP and AI. * Developers: Those looking to integrate sentiment analysis capabilities into their applications. * Anyone interested in: Natural Language Processing, text analytics, and understanding emotional patterns in digital communication.

    Dataset Name Suggestions

    • Microblog Sentiment Compendium
    • Tweet Emotion Dataset
    • Social Text Moods
    • Sentiment Classification Microblogs
    • Digital Moods Dataset

    Attributes

    Original Data Source: Emotion Prediction with Quantum5 Neural Network AI

  15. A

    ‘Precipitation Prediction in LA’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Precipitation Prediction in LA’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-precipitation-prediction-in-la-8cce/f3c83692/?iid=002-283&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Precipitation Prediction in LA’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunnagpalspyz/precipitation-prediction-in-la on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

    Content

    The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

    Acknowledgements

    All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

    Inspiration

    Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.

    --- Original source retains full ownership of the source dataset ---

  16. agile project dataset 2024

    • kaggle.com
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    digro k (2025). agile project dataset 2024 [Dataset]. https://www.kaggle.com/datasets/digrok/agile-project-dataset-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    digro k
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset Description: 200 Agile Software Projects Overview This dataset contains records of 200 Agile software development projects. It includes various performance metrics related to Agile methodologies, measuring their effectiveness in project success, risk mitigation, time efficiency, and cost savings. The dataset is designed for analysis of AI-driven automation in Agile software teams.

    Dataset Variables Agile Effectiveness (Likert scale: 2 to 5)

    1. Measures how well Agile methodologies enhance project management processes. Risk Mitigation (Likert scale: 2 to 5)

    2. Captures the effectiveness of Agile in identifying and reducing risks throughout the project lifecycle. Management Satisfaction (Likert scale: 2 to 5)

    3. Represents how satisfied management is with the outcomes of Agile-implemented projects. Supply Chain Improvement (Likert scale: 2 to 5)

    4. Evaluates the impact of Agile practices on optimizing supply chain processes. Time Efficiency (Likert scale: 2 to 5)

    5. Measures improvements in time management within Agile projects. Cost Savings (%) (Range: 10% to 48%)

    6. Quantifies the percentage of cost savings achieved due to Agile methodologies. Project Success (Binary: 0 = Failure, 1 = Success)

    Indicates whether the project was considered successful. Usage This dataset is useful for: ✅ Evaluating the impact of AI automation on Agile workflows. ✅ Understanding factors contributing to Agile project success. ✅ Analyzing cost savings and efficiency improvements in Agile teams. ✅ Building machine learning models to predict project success based on Agile metrics.

  17. Predictive Maintenance Dataset

    • kaggle.com
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Agarwal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

    The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.

  18. i

    Data from: Disease Prediction Dataset

    • ieee-dataport.org
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Nautiyal (2025). Disease Prediction Dataset [Dataset]. https://ieee-dataport.org/documents/disease-prediction-dataset
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Ayush Nautiyal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains symptoms and disease information. It contains total of 1325 symptoms covered with 391 disease.This dataset is refernced from website MedLinePlus. This dataset have training and testing dataset and can be used to train disease prediction algorithm . It is created on own for project disease prediction and do not involves any funding or promotional terms.

  19. Project

    • kaggle.com
    zip
    Updated Mar 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AgrasenShah (2022). Project [Dataset]. https://www.kaggle.com/agrasenshah/project
    Explore at:
    zip(1240318687 bytes)Available download formats
    Dataset updated
    Mar 5, 2022
    Authors
    AgrasenShah
    Description

    Dataset

    This dataset was created by Omega

    Contents

  20. meme project raw

    • kaggle.com
    zip
    Updated Apr 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zacchaeus (2021). meme project raw [Dataset]. https://www.kaggle.com/zacchaeus/meme-project-raw
    Explore at:
    zip(99797452 bytes)Available download formats
    Dataset updated
    Apr 25, 2021
    Authors
    Zacchaeus
    Description

    Dataset

    This dataset was created by Zacchaeus

    Contents

    It contains the following files:

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
KHUSHI YADAV (2023). IMDB Dataset For Machine Learning [Dataset]. https://www.kaggle.com/datasets/khushiyadav2022/imdb-dataset-for-machine-learning
Organization logo

IMDB Dataset For Machine Learning

Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KHUSHI YADAV
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

"Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.

The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.

Search
Clear search
Close search
Google apps
Main menu