79 datasets found
  1. Machine Learning Projects.

    • kaggle.com
    Updated Oct 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avinash Shan Monteiro (2020). Machine Learning Projects. [Dataset]. https://www.kaggle.com/avinashshanmonteiro/machine-learning-porjects/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Avinash Shan Monteiro
    Description

    Dataset

    This dataset was created by Avinash Shan Monteiro

    Released under Data files © Original Authors

    Contents

  2. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  3. IMDB Dataset For Machine Learning

    • kaggle.com
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KHUSHI YADAV (2023). IMDB Dataset For Machine Learning [Dataset]. https://www.kaggle.com/datasets/khushiyadav2022/imdb-dataset-for-machine-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    KHUSHI YADAV
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    "Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.

    The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.

  4. h

    kaggle-airline-delay

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kaggle-airline-delay [Dataset]. https://huggingface.co/datasets/ml-project-group-3/kaggle-airline-delay
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    ML Project Group 3
    Description

    ml-project-group-3/kaggle-airline-delay dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. buds-lab/building-data-genome-project-2: v1.0

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers (2020). buds-lab/building-data-genome-project-2: v1.0 [Dataset]. http://doi.org/10.5281/zenodo.3887306
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BDG2 open data set consists of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters are collected from 19 sites across North America and Europe, and they measure electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data was used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. This subset includes data from 2,380 meters from 1,448 buildings that were used in the GEPIII, a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.

  6. ml project

    • kaggle.com
    Updated May 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kiratjot singh (2023). ml project [Dataset]. https://www.kaggle.com/datasets/kiratjotsingh/ml-project/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    kiratjot singh
    Description

    Dataset

    This dataset was created by kiratjot singh

    Contents

  7. Machine Learning End-to-End Projects

    • kaggle.com
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vamsi kamatham (2023). Machine Learning End-to-End Projects [Dataset]. https://www.kaggle.com/datasets/vamsikrishnakamatham/end-to-end-machine-learning-projects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vamsi kamatham
    Description

    Dataset

    This dataset was created by vamsi kamatham

    Contents

  8. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  9. agile project dataset 2024

    • kaggle.com
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    digro k (2025). agile project dataset 2024 [Dataset]. https://www.kaggle.com/datasets/digrok/agile-project-dataset-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    digro k
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset Description: 200 Agile Software Projects Overview This dataset contains records of 200 Agile software development projects. It includes various performance metrics related to Agile methodologies, measuring their effectiveness in project success, risk mitigation, time efficiency, and cost savings. The dataset is designed for analysis of AI-driven automation in Agile software teams.

    Dataset Variables Agile Effectiveness (Likert scale: 2 to 5)

    1. Measures how well Agile methodologies enhance project management processes. Risk Mitigation (Likert scale: 2 to 5)

    2. Captures the effectiveness of Agile in identifying and reducing risks throughout the project lifecycle. Management Satisfaction (Likert scale: 2 to 5)

    3. Represents how satisfied management is with the outcomes of Agile-implemented projects. Supply Chain Improvement (Likert scale: 2 to 5)

    4. Evaluates the impact of Agile practices on optimizing supply chain processes. Time Efficiency (Likert scale: 2 to 5)

    5. Measures improvements in time management within Agile projects. Cost Savings (%) (Range: 10% to 48%)

    6. Quantifies the percentage of cost savings achieved due to Agile methodologies. Project Success (Binary: 0 = Failure, 1 = Success)

    Indicates whether the project was considered successful. Usage This dataset is useful for: ✅ Evaluating the impact of AI automation on Agile workflows. ✅ Understanding factors contributing to Agile project success. ✅ Analyzing cost savings and efficiency improvements in Agile teams. ✅ Building machine learning models to predict project success based on Agile metrics.

  10. Predictive Maintenance Dataset

    • kaggle.com
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Agarwal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

    The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.

  11. PERSONAL ML PROJECT

    • kaggle.com
    Updated Sep 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sameeda Siddiqua (2020). PERSONAL ML PROJECT [Dataset]. https://www.kaggle.com/sameedasiddiqua/personal-ml-project/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 24, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sameeda Siddiqua
    Description

    Dataset

    This dataset was created by Sameeda Siddiqua

    Contents

  12. P

    Materials Project Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jain (2023). Materials Project Dataset [Dataset]. https://paperswithcode.com/dataset/materials-project
    Explore at:
    Authors
    Jain
    Description

    The Materials Project is a collection of chemical compounds labelled with different attributes. The labelling is performed by different simulations, most of them at DFT level of theory.

    The dataset links:

    MP 2018.6.1 (69,239 materials) MP 2019.4.1 (133,420 materials)

  13. A

    ‘Precipitation Prediction in LA’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Precipitation Prediction in LA’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-precipitation-prediction-in-la-8cce/f3c83692/?iid=002-283&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Precipitation Prediction in LA’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunnagpalspyz/precipitation-prediction-in-la on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

    Content

    The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

    Acknowledgements

    All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

    Inspiration

    Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.

    --- Original source retains full ownership of the source dataset ---

  14. f

    Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image...

    • figshare.com
    application/x-rar
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Elbattah (2023). Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7073087.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Mahmoud Elbattah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide a dataset that includes visualizations of eye-tracking scanpaths with a particular focus Autism Spectrum Disorder (ASD). The key idea is to transform the dynamics of eye motion into visual patterns, and hence diagnosis-related tasks could be approached using image analysis techniques. The image dataset is publicly available to be used by other studies aiming to experiment the usability of eye-tracking within the ASD context. It is believed that the dataset can allow for the development of further interesting applications using Machine Learning or image processing techniques. For more info, please refer to the publication below and the project website.Original Publication:Carette, R., Elbattah, M., Dequen, G., Guérin, J, & Cilia, F. (2019, February). Learning to predict autism spectrum disorder based on the visual patterns of eye-tracking scanpaths. In Proceedings of the 12th International Conference on Health Informatics (HEALTHINF 2019).Project Website:https://www.researchgate.net/project/Predicting-Autism-Spectrum-Disorder-Using-Machine-Learning-and-Eye-Trackinghttps://mahmoud-elbattah.github.io/ML4Autism/

  15. UCI ML Drug Review dataset

    • kaggle.com
    Updated Dec 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Li (2018). UCI ML Drug Review dataset [Dataset]. https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018/home
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jessica Li
    Description

    This dataset was used for the Winter 2018 Kaggle University Club Hackathon and is now publicly available. See Acknowledgments section for citation and licensing. Note: The types of data and recommendation based solutions provided by the contestants are purely for NLP learning purposes. They are not suitable for a real world drug recommendations solutions.

    Welcome to the Kaggle University Club Hackathon!

    If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com

    This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.

    Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.

    Prompt

    The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.

    Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).

    The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.

    Here are just a couple ideas as to what you could do with the data:

    • Classification: Can you predict the patient's condition based on the review?
    • Regression: Can you predict the rating of the drug based on the review?
    • Sentiment analysis: What elements of a review make it more helpful to others? Which patients tend to have more negative reviews? Can you determine if a review is positive, neutral, or negative?
    • Data visualizations: What kind of drugs are there? What sorts of conditions do these patients have?

    Top Submissions

    There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:

    1. Complex: How many domains of analysis and topics does this Kernel cover? Does it attempt machine learning methods? Does the Kernel offer a variety of unique analyses and interesting conclusions or solutions?
    2. Original: What is the subject matter of this Kernel? Does it have a well-defined and interesting project scope, narrative or problem? Could the results make an impact? Is it thought provoking?
    3. Approachable: How easy is it to understand this Kernel? Are all thought processes clear? Is the code clean, with useful comments? Are visualizations and processes articulated and self-explanatory?

    Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.

    IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.

    Submission Styling

    The final Kernel submission for the Hackathon must contain the following information:

    • All team members added as collaborators to the Kernel
    • Somewhere at the top of your Kernel, find a space to put down all team member names, university name, club name, and team name (as specified whe...
  16. i

    Data from: Disease Prediction Dataset

    • ieee-dataport.org
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayush Nautiyal (2025). Disease Prediction Dataset [Dataset]. https://ieee-dataport.org/documents/disease-prediction-dataset
    Explore at:
    Dataset updated
    Feb 20, 2025
    Authors
    Ayush Nautiyal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains symptoms and disease information. It contains total of 1325 symptoms covered with 391 disease.This dataset is refernced from website MedLinePlus. This dataset have training and testing dataset and can be used to train disease prediction algorithm . It is created on own for project disease prediction and do not involves any funding or promotional terms.

  17. ml_project_training

    • kaggle.com
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorrin Sotoudeh (2023). ml_project_training [Dataset]. https://www.kaggle.com/datasets/dorrinsotoudeh/ml-project-training/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dorrin Sotoudeh
    Description

    Dataset

    This dataset was created by Dorrin Sotoudeh

    Contents

  18. Students drugs Addiction Dataset 2024

    • kaggle.com
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheema Zain (2024). Students drugs Addiction Dataset 2024 [Dataset]. https://www.kaggle.com/datasets/sheemazain/students-drugs-addiction-dataset-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sheema Zain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    For those interested in data on student drug addiction in 2024, several sources offer valuable datasets and statistics.

    1. Kaggle Dataset: Kaggle hosts a specific dataset on student drug addiction. This dataset includes various attributes related to student demographics, substance use patterns, and associated behavioral factors. It's a useful resource for data analysis and machine learning projects focused on understanding drug addiction among students【5†source】.

    2. National Survey on Drug Use and Health (NSDUH): This comprehensive survey provides detailed annual data on substance use and mental health across the United States, including among students. It covers a wide range of substances and demographic details, helping to track trends and the need for treatment services【6†source】【8†source】.

    3. Monitoring the Future (MTF) Survey: Conducted by the National Institute on Drug Abuse (NIDA), this survey tracks drug and alcohol use and attitudes among American adolescents. It provides annual updates and is an excellent source for understanding trends in substance use among high school and college students【7†source】.

    4. Australian Institute of Health and Welfare (AIHW): For those interested in a more global perspective, the AIHW offers data from the National Drug Strategy Household Survey, which includes information on youth and young adult drug use in Australia. This can be useful for comparative studies【10†source】.

    For detailed datasets and further analysis, you can explore these resources directly:

  19. ml_project_dataset

    • kaggle.com
    zip
    Updated Mar 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayf Hussain (2021). ml_project_dataset [Dataset]. https://www.kaggle.com/sayfhussain/ml-project-dataset
    Explore at:
    zip(6423728307 bytes)Available download formats
    Dataset updated
    Mar 26, 2021
    Authors
    Sayf Hussain
    Description

    Dataset

    This dataset was created by Sayf Hussain

    Contents

  20. Banana Classification

    • kaggle.com
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atri Thakar (2024). Banana Classification [Dataset]. https://www.kaggle.com/datasets/atrithakar/banana-classification/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2024
    Dataset provided by
    Kaggle
    Authors
    Atri Thakar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a dataset for detecting banana quality using ML. This dataset contains four categories: Unripe, Ripe, Overripe and Rotten. In this dataset, there are enormous amount of images which will help users to train the ML model conveniently and easily.

    NOTE: THIS DATASET HAS BEEN PICKED FROM https://universe.roboflow.com/roboflow-universe-projects/banana-ripeness-classification. I WAS FACING DIFFICULTIES WHILE DOWNLOADING DATASET DIRECTLY TO THE GOOGLE COLAB TO TRAIN MY CNN MODEL AS A PART OF UNIVERSITY PROJECT. ALL CREDITS FOR THIS DATASET, AS FAR AS MY KNOWLEDGE GOES, GOES TO ROBOFLOW. I DO NOT INTEND TO TAKE ANY CREDITS MYSELF OR UNETHICALLY CLAIM OWNERSHIP, I JUST UPLOADED DATASET HERE FOR MY CONVENIENCE, THANK YOU.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Avinash Shan Monteiro (2020). Machine Learning Projects. [Dataset]. https://www.kaggle.com/avinashshanmonteiro/machine-learning-porjects/code
Organization logo

Machine Learning Projects.

This repository will give you all my projects and ML algorithms.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Avinash Shan Monteiro
Description

Dataset

This dataset was created by Avinash Shan Monteiro

Released under Data files © Original Authors

Contents

Search
Clear search
Close search
Google apps
Main menu