97 datasets found

IMDB Dataset For Machine Learning
kaggle.com
Updated Sep 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KHUSHI YADAV (2023). IMDB Dataset For Machine Learning [Dataset]. https://www.kaggle.com/datasets/khushiyadav2022/imdb-dataset-for-machine-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
KHUSHI YADAV
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
"Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.

The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.
Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21967265.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE
Deep learning term project
kaggle.com
Updated Mar 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sampreeth R S (2024). Deep learning term project [Dataset]. https://www.kaggle.com/sampreethrs/deep-learning-term-project/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sampreeth R S
Description
Dataset

This dataset was created by Sampreeth R S

Contents
buds-lab/building-data-genome-project-2: v1.0
zenodo.org
data.niaid.nih.gov
zip
Updated Sep 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers (2020). buds-lab/building-data-genome-project-2: v1.0 [Dataset]. http://doi.org/10.5281/zenodo.3887306
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3887306
Dataset updated
Sep 2, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers; Clayton Miller; Anjukan Kathirgamanathan; Bianca Picchetti; Pandarasamy Arjunan; June Young Park; Zoltan Nagy; Paul Raftery; Brodie W. Hobson; Zixiao Shi; Forrest Meggers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BDG2 open data set consists of 3,053 energy meters from 1,636 non-residential buildings with a range of two full years (2016 and 2017) at an hourly frequency (17,544 measurements per meter resulting in approximately 53.6 million measurements). These meters are collected from 19 sites across North America and Europe, and they measure electrical, heating and cooling water, steam, and solar energy as well as water and irrigation meters. Part of these data was used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in October-December 2019. This subset includes data from 2,380 meters from 1,448 buildings that were used in the GEPIII, a machine learning competition for long-term prediction with an application to measurement and verification. This paper describes the process of data collection, cleaning, and convergence of time-series meter data, the meta-data about the buildings, and complementary weather data. This data set can be used for further prediction benchmarking and prototyping as well as anomaly detection, energy analysis, and building type classification.
Machine Learning Projects.
kaggle.com
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avinash Shan Monteiro (2020). Machine Learning Projects. [Dataset]. https://www.kaggle.com/avinashshanmonteiro/machine-learning-porjects/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Avinash Shan Monteiro
Description
Dataset

This dataset was created by Avinash Shan Monteiro

Released under Data files © Original Authors

Contents
Machine Learning End-to-End Projects
kaggle.com
Updated May 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vamsi kamatham (2023). Machine Learning End-to-End Projects [Dataset]. https://www.kaggle.com/datasets/vamsikrishnakamatham/end-to-end-machine-learning-projects
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vamsi kamatham
Description
Dataset

This dataset was created by vamsi kamatham

Contents
P
EDGE-IIOTSET Dataset
paperswithcode.com
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
Explore at:
Dataset updated
Oct 16, 2023
Description
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

Instructions:

Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

Link to paper : https://ieeexplore.ieee.org/document/9751703

The directories of the Edge-IIoTset dataset include the following:

•File 1 (Normal traffic)

-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

•File 2 (Attack traffic):

-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

•File 3 (Selected dataset for ML and DL):

-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

!pip install -q kaggle

files.upload()

!mkdir ~/.kaggle

!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

!unzip DNN-EdgeIIoT-dataset.csv.zip

!rm DNN-EdgeIIoT-dataset.csv.zip

Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

import numpy as np

df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

Step 3 : Exploring some of the DataFrame's contents: df.head(5)

print(df['Attack_type'].value_counts())

Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

"http.file_data","http.request.full_uri","icmp.transmit_timestamp", "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport", "tcp.dstport", "udp.port", "mqtt.msg"]

df.drop(drop_columns, axis=1, inplace=True)

df.dropna(axis=0, how='any', inplace=True)

df.drop_duplicates(subset=None, keep="first", inplace=True)

df = shuffle(df)

df.isna().sum()

print(df['Attack_type'].value_counts())

Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn import preprocessing

def encode_text_dummy(df, name):

dummies = pd.get_dummies(df[name])

for x in dummies.columns:

dummy_name = f"{name}-{x}" df[dummy_name] = dummies[x]

df.drop(name, axis=1, inplace=True)

encode_text_dummy(df,'http.request.method')

encode_text_dummy(df,'http.referer')

encode_text_dummy(df,"http.request.version")

encode_text_dummy(df,"dns.qry.name.len")

encode_text_dummy(df,"mqtt.conack.flags")

encode_text_dummy(df,"mqtt.protoname")

encode_text_dummy(df,"mqtt.topic")

Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

More information about Dr. Mohamed Amine Ferrag is available at:

https://www.linkedin.com/in/Mohamed-Amine-Ferrag

https://dblp.uni-trier.de/pid/142/9937.html

https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

https://www.scopus.com/authid/detail.uri?authorId=56115001200

https://publons.com/researcher/1322865/mohamed-amine-ferrag/

https://orcid.org/0000-0002-0632-3172

Last Updated: 27 Mar. 2023
Personal skills dataset
kaggle.com
Updated Oct 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hanan Abbas (2020). Personal skills dataset [Dataset]. https://www.kaggle.com/hanaanabbas/personal-skills-dataset/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2020
Dataset provided by
Kaggle
Authors
Hanan Abbas
Description
Dataset

This dataset was created by Hanan Abbas

Contents
f
Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image...
figshare.com
application/x-rar
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmoud Elbattah (2023). Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.7073087.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7073087.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Mahmoud Elbattah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We provide a dataset that includes visualizations of eye-tracking scanpaths with a particular focus Autism Spectrum Disorder (ASD). The key idea is to transform the dynamics of eye motion into visual patterns, and hence diagnosis-related tasks could be approached using image analysis techniques. The image dataset is publicly available to be used by other studies aiming to experiment the usability of eye-tracking within the ASD context. It is believed that the dataset can allow for the development of further interesting applications using Machine Learning or image processing techniques. For more info, please refer to the publication below and the project website.Original Publication:Carette, R., Elbattah, M., Dequen, G., Guérin, J, & Cilia, F. (2019, February). Learning to predict autism spectrum disorder based on the visual patterns of eye-tracking scanpaths. In Proceedings of the 12th International Conference on Health Informatics (HEALTHINF 2019).Project Website:https://www.researchgate.net/project/Predicting-Autism-Spectrum-Disorder-Using-Machine-Learning-and-Eye-Trackinghttps://mahmoud-elbattah.github.io/ML4Autism/
Datasets for Crime Hotspot Prediction Project
figshare.com
application/x-rar
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuğrul Cabir Hakyemez; Bertan Badur (2023). Datasets for Crime Hotspot Prediction Project [Dataset]. http://doi.org/10.6084/m9.figshare.24171672.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24171672.v1
Dataset updated
Sep 20, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tuğrul Cabir Hakyemez; Bertan Badur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets: This repository includes a folder titled Datasets that contains all crime and park event tables. These tables consist of event observations across street segments (columns) with time steps (rows). This is a crucial aspect of the project. Additionally, the folder contains Python pickles that store network information, street segment details, and crime location information. Each file in this folder is required by the script to generate predictions.Predictions: The folder also contains predictions across all the models (e.g., theft daily, robbery shift). These files have two columns: actual and predicted values, respectively. The structure is 2459 rows by test days. This means that the first 2459 rows represent predictions for each segment on the first day, the next 2459 rows represent predictions for the second day, and so on.
Female facial image dataset
kaggle.com
Updated Apr 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alpha Lu (2019). Female facial image dataset [Dataset]. https://www.kaggle.com/urnotalphalu711/female-facial-image-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alpha Lu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Alpha Lu

Released under CC0: Public Domain

Contents
P
Materials Project Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jain (2023). Materials Project Dataset [Dataset]. https://paperswithcode.com/dataset/materials-project
Explore at:
Authors
Jain
Description
The Materials Project is a collection of chemical compounds labelled with different attributes. The labelling is performed by different simulations, most of them at DFT level of theory.

The dataset links:

MP 2018.6.1 (69,239 materials) MP 2019.4.1 (133,420 materials)
deep_learning_abracadata
kaggle.com
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
khmiri iheb (2024). deep_learning_abracadata [Dataset]. https://www.kaggle.com/datasets/khmiriiheb/deep-learning-abracadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
khmiri iheb
Description
Dataset

This dataset was created by khmiri iheb

Contents
o
Tweet Emotion Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Tweet Emotion Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/06728d6d-f212-4aed-a1e1-ba02c5b16cf5
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset is a collection of microblog entries, such as tweets, paired with their corresponding sentiment or emotional labels. It serves as a valuable resource for developing and testing artificial intelligence models capable of predicting emotions from textual content. The dataset's purpose is to provide rich, real-world examples of text and associated human sentiments, making it ideal for tasks like sentiment analysis and emotion detection in Natural Language Processing (NLP) systems.

Columns

tweet_id: A unique numerical identifier for each individual microblog entry.

sentiment: The categorised emotion or mood associated with the content. Examples of sentiments found include 'empty', 'sadness', 'enthusiasm', 'neutral', 'worry', 'love', 'fun', and 'surprise'.

content: The actual text of the microblog entry.

Distribution

The dataset is structured in a tabular format, typically stored as a CSV file. Specific numbers for rows or records are not available in the provided details, but a sample of the data showcases its structure. Further details on the complete file size and record count would be updated separately.

Usage

This dataset is ideally suited for various applications, including: * Sentiment Analysis: Training machine learning models to identify the emotional tone of text. * Emotion Detection: Building AI software capable of predicting specific emotions from written content. * Natural Language Processing (NLP) Research: Exploring and developing new algorithms for text understanding and classification. * Academic Projects and Theses: Providing a practical dataset for research and development in text-based AI. * Social Media Monitoring: Analysing public sentiment on various topics based on microblog data.

Coverage

The dataset primarily covers textual content from microblogs. While a specific geographic region is not stated for the collected tweets, such data is typically global in nature. No explicit time range for the original data collection is provided. The demographic scope is broad, reflecting general microblog users, with no specific notes on availability for certain groups or years.

License

CCO

Who Can Use It

This dataset is intended for a wide range of users, including: * Data Scientists: For building and evaluating sentiment and emotion prediction models. * Machine Learning Engineers: For training and fine-tuning text classification algorithms. * Academic Researchers and Students: For use in theses, projects, and scientific studies related to NLP and AI. * Developers: Those looking to integrate sentiment analysis capabilities into their applications. * Anyone interested in: Natural Language Processing, text analytics, and understanding emotional patterns in digital communication.

Dataset Name Suggestions

Microblog Sentiment Compendium

Tweet Emotion Dataset

Social Text Moods

Sentiment Classification Microblogs

Digital Moods Dataset

Attributes

Original Data Source: Emotion Prediction with Quantum5 Neural Network AI
A
‘Precipitation Prediction in LA’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Precipitation Prediction in LA’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-precipitation-prediction-in-la-8cce/f3c83692/?iid=002-283&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Precipitation Prediction in LA’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunnagpalspyz/precipitation-prediction-in-la on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.

Content

The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type

Acknowledgements

All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/

Inspiration

Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.

--- Original source retains full ownership of the source dataset ---
agile project dataset 2024
kaggle.com
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
digro k (2025). agile project dataset 2024 [Dataset]. https://www.kaggle.com/datasets/digrok/agile-project-dataset-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
digro k
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Dataset Description: 200 Agile Software Projects Overview This dataset contains records of 200 Agile software development projects. It includes various performance metrics related to Agile methodologies, measuring their effectiveness in project success, risk mitigation, time efficiency, and cost savings. The dataset is designed for analysis of AI-driven automation in Agile software teams.

Dataset Variables Agile Effectiveness (Likert scale: 2 to 5)

Measures how well Agile methodologies enhance project management processes. Risk Mitigation (Likert scale: 2 to 5)

Captures the effectiveness of Agile in identifying and reducing risks throughout the project lifecycle. Management Satisfaction (Likert scale: 2 to 5)

Represents how satisfied management is with the outcomes of Agile-implemented projects. Supply Chain Improvement (Likert scale: 2 to 5)

Evaluates the impact of Agile practices on optimizing supply chain processes. Time Efficiency (Likert scale: 2 to 5)

Measures improvements in time management within Agile projects. Cost Savings (%) (Range: 10% to 48%)

Quantifies the percentage of cost savings achieved due to Agile methodologies. Project Success (Binary: 0 = Failure, 1 = Success)

Indicates whether the project was considered successful. Usage This dataset is useful for: ✅ Evaluating the impact of AI automation on Agile workflows. ✅ Understanding factors contributing to Agile project success. ✅ Analyzing cost savings and efficiency improvements in Agile teams. ✅ Building machine learning models to predict project success based on Agile metrics.
Predictive Maintenance Dataset
kaggle.com
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
i
Data from: Disease Prediction Dataset
ieee-dataport.org
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Nautiyal (2025). Disease Prediction Dataset [Dataset]. https://ieee-dataport.org/documents/disease-prediction-dataset
Explore at:
Dataset updated
Feb 20, 2025
Authors
Ayush Nautiyal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains symptoms and disease information. It contains total of 1325 symptoms covered with 391 disease.This dataset is refernced from website MedLinePlus. This dataset have training and testing dataset and can be used to train disease prediction algorithm . It is created on own for project disease prediction and do not involves any funding or promotional terms.
Project
kaggle.com
zip
Updated Mar 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AgrasenShah (2022). Project [Dataset]. https://www.kaggle.com/agrasenshah/project
Explore at:
zip(1240318687 bytes)Available download formats
Dataset updated
Mar 5, 2022
Authors
AgrasenShah
Description
Dataset

This dataset was created by Omega

Contents
meme project raw
kaggle.com
zip
Updated Apr 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zacchaeus (2021). meme project raw [Dataset]. https://www.kaggle.com/zacchaeus/meme-project-raw
Explore at:
zip(99797452 bytes)Available download formats
Dataset updated
Apr 25, 2021
Authors
Zacchaeus
Description
Dataset

This dataset was created by Zacchaeus

Contents

It contains the following files:

Facebook

Twitter

Click to copy link

Link copied

Cite

KHUSHI YADAV (2023). IMDB Dataset For Machine Learning [Dataset]. https://www.kaggle.com/datasets/khushiyadav2022/imdb-dataset-for-machine-learning

IMDB Dataset For Machine Learning

Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 25, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

KHUSHI YADAV

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

"Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning" is an exciting project focused on leveraging the IMDB Dataset for developing an advanced movie recommendation system. This project aims to explore the vast potential of machine learning techniques in providing personalized movie recommendations to users.

The IMDB Dataset, comprising a wealth of movie information including genres, ratings, and user reviews, serves as the foundation for this project. By harnessing the power of machine learning algorithms and data analysis, the project seeks to build a recommendation system that can accurately suggest movies tailored to each individual's preferences.

Clear search

Close search

Google apps

Main menu

IMDB Dataset For Machine Learning

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

Deep learning term project

Dataset

Contents

buds-lab/building-data-genome-project-2: v1.0

Machine Learning Projects.

Dataset

Contents

Machine Learning End-to-End Projects

Dataset

Contents

EDGE-IIOTSET Dataset

Personal skills dataset

Dataset

Contents

Visualization of Eye-Tracking Scanpaths in Autism Spectrum Disorder: Image...

Datasets for Crime Hotspot Prediction Project

Female facial image dataset

Dataset

Contents

Materials Project Dataset

deep_learning_abracadata

Dataset

Contents

Tweet Emotion Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

‘Precipitation Prediction in LA’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

agile project dataset 2024

Predictive Maintenance Dataset

Data from: Disease Prediction Dataset

Project

Dataset

Contents

meme project raw

Dataset

Contents

IMDB Dataset For Machine Learning

Movie Recommendation on the IMDB Dataset: A Journey into Machine Learning