100+ datasets found
  1. Machine Learning Tutorials - Example Projects - AI

    • kaggle.com
    zip
    Updated Oct 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMİRHAN BULUT (2022). Machine Learning Tutorials - Example Projects - AI [Dataset]. https://www.kaggle.com/datasets/emirhanai/machine-learning-tutorials-example-projects-ai
    Explore at:
    zip(1587192509 bytes)Available download formats
    Dataset updated
    Oct 20, 2022
    Authors
    EMİRHAN BULUT
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Machine Learning Tutorials - Example Projects - AI

    I am sharing my 28 Machine Learning, Deep Learning (Artificial Intelligence - AI) projects with their data, software and outputs on Kaggle for educational purposes as open source. It appeals to people who want to work in this field, have 0 Machine Learning knowledge, have Intermediate Machine Learning knowledge, specialize in this field (Attracts to all levels). The deep learning projects in it are for advanced level, so I recommend you to start your studies from the Machine Learning section. You can check your own outputs along with the outputs in it. I am happy to share 28 educational projects with the whole world through Kaggle. Knowledge is free and better when shared!

    Algorithms used in it:

    1) Nearest Neighbor
    2) Naive Bayes
    3) Decision Trees
    4) Linear Regression
    5) Support Vector Machines (SVM)
    6) Neural Networks
    7) K-means clustering
    

    Kind regards, Emirhan BULUT

    You can use the links below for communication. If you have any questions or comments, feel free to let me know!

    LinkedIn: https://www.linkedin.com/in/artificialintelligencebulut/ Email: emirhan@novosteer.com

    Emirhan BULUT. (2022). Machine Learning Tutorials - Example Projects - AI [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/4361310

  2. Eye Disease Deep Learning Dataset

    • kaggle.com
    zip
    Updated Jan 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bongsang Kim (2022). Eye Disease Deep Learning Dataset [Dataset]. https://www.kaggle.com/datasets/bongsang/eye-disease-deep-learning-dataset
    Explore at:
    zip(248117276 bytes)Available download formats
    Dataset updated
    Jan 15, 2022
    Authors
    Bongsang Kim
    Description

    Context

    I think there are extremely lack of open datasets and algorithms for accelerating medical AI. So, I'm researching to make a global baseline classifier for hospitals which has not enough data and AI capability. I hope you are interested in, too.

    Content

    The labels of this dataset consists of 3 categories, 5 types and 5 grades. It can be 75 multi-labels.

    Category

    • point-like corneal ulcers
    • point-flaky mixed corneal ulcers
    • flaky corneal ulcers

    Types

    • type 0 : No ulcer of the corneal epithelium
    • type 1 : Micro punctate
    • type 2 : Macro punctate
    • type 3 : Coalescent macro punctate
    • type 4 : Patch (>=1 mm)

    Grade

    • grade 0 : No ulcer of the corneal epithelium
    • grade 1 : Corneal ulcers involve only one surrounding quadrant
    • grade 2 : Corneal ulcers involve two surrounding quadrants
    • grade 3 : Corneal ulcers involve three or four surrounding quadrants
    • grade 4 : Corneal ulcers involve the central optical zone of the cornea

    Acknowledgements

    Deng, L., Lyu, J., Huang, H. et al. The SUSTech-SYSU dataset for automatically segmenting and classifying corneal ulcers. Sci Data 7, 23 (2020). https://doi.org/10.1038/s41597-020-0360-7

  3. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  4. IMAGE PROCESSING USING DEEP LEARNING

    • kaggle.com
    zip
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ANGAMUTHU T VELS UNIVERSITY (2023). IMAGE PROCESSING USING DEEP LEARNING [Dataset]. https://www.kaggle.com/datasets/mindtechsolution/image-processing-using-deep-learning
    Explore at:
    zip(12291 bytes)Available download formats
    Dataset updated
    Apr 17, 2023
    Authors
    ANGAMUTHU T VELS UNIVERSITY
    Description

    Images have always played a vital role in human life because vision is the most crucial sense for humans. As a result, image processing has a wide range of applications. Photographs are everywhere nowadays, more than ever, and it is quite easy for anyone to make a large number of photographs utilizing a smart phone. Given the complexities of vision, machine learning has emerged as a critical component of intelligent computer vision programmed when adaptability is required. Deep learning is a subfield of artificial intelligence that combines a number of statistical, probabilistic, and optimisation techniques to enable computers to "learn" from previous examples and find difficult-to-detect patterns in big, noisy, or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex promote and genomic measurements. An innovative integration of machine learning in image processing is very likely to have a great benefit to the field, which will contribute to a better understanding of complex images. This capability is especially well-suited to medical applications that rely on complicated promote and genomic measurements. A novel application of deep learning in image processing is extremely likely to benefit the field and lead to a better understanding of complicated images. A country’s economy is dependent on agricultural productivity. The identification of plant diseases is critical for reducing production losses and enhancing agricultural product quality. Traditional methods are dependable, but they necessitate the use of a human resource to visually observe plant leaf patterns and identify disease. Traditional methods take more time and need more labour. Early identification of plant disease utilising automated procedures will reduce productivity loss in large farm fields. We propose a vision-based automatic detection of plant disease detection utilising Image Processing Technique in this research. By recognising the colour feature of the leaf region, image processing algorithms are developed to detect plant illness or disease. The K mean algorithm is utilised for colour segmentation, whereas the GLCM algorithm is employed for disease classification. Plant infection based on vision yielded efficient results and Promising performance.

  5. Crack Dataset

    • kaggle.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yatata (2023). Crack Dataset [Dataset]. https://www.kaggle.com/datasets/yatata1/crack-dataset
    Explore at:
    zip(13671421501 bytes)Available download formats
    Dataset updated
    Jun 3, 2023
    Authors
    Yatata
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Custom dataset of cracks, which was curated from multiple sources to provide a comprehensive range of crack scenarios for machine learning models. It consists of color images with positive and negative crack scenarios, The inclusion of the corresponding mask images and bounding box identifications through image segmentation and image detection opportunities further enhances the usability of this dataset. (TODO)

    This dataset was created to address the issue of unification in the field of crack detection. Although there are already numerous datasets available for this problem, they are often fragmented across multiple sources and may, in some samples, lack consistency due to automatic mask generation methods. Furthermore, some of the datasets presented some isolated cases of incorrect image-mask pairs or low precision cases, which can significantly affect the quality and accuracy of the model. In turn, this makes it difficult to create reliable and accurate machine learning models and, to ensure that the custom dataset of cracks is of the highest quality, these issues had to be eliminated or taken with extra care by manually optimizing them when necessary. (Still under development)

    Most images are of size (448, 448) due to the bigger percentage of images from the Kaggle Dataset "Crack Segmentation Dataset".

    Due to the various contributors involved, this dataset provides a solid opportunity to train machine learning models under various lighting, material, and quality conditions. By creating a unified and reliable dataset, I hope to contribute to the advancement of crack detection in the field of machine learning. The custom dataset of cracks will enable researchers and practitioners to create more accurate and effective models, which will ultimately lead to improved outcomes in the detection and prevention of cracks in various materials.

    Citations

    @inproceedings{liu2019deep, title={Deep Learning Based Automatic Crack Detection and Segmentation for Unmanned Aerial Vehicle Inspections}, author={Liu, Kangcheng and Han, Xiaodong and Chen, Ben M}, booktitle={2019 IEEE International Conference on Robotics and Biomimetics (ROBIO)}, number={https://ieeexplore.ieee.org/document/896}, pages={381--387}, year={2019}, organization={IEEE} }

    @article{liu2022industrial, title={Industrial uav-based unsupervised domain adaptive crack recognitions: From system setups to real-site infrastructural inspections}, author={Liu, Kangcheng and Chen, Ben M}, journal={IEEE Transactions on Industrial Electronics}, year={2022}, publisher={IEEE} }

  6. Weather Prediction

    • kaggle.com
    • zenodo.org
    zip
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2024). Weather Prediction [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-prediction
    Explore at:
    zip(958204 bytes)Available download formats
    Dataset updated
    Mar 10, 2024
    Authors
    The Devastator
    Description

    Credit to the original author: The dataset was originally published here

    Weather prediction dataset

    A dataset for teaching machine learning and deep learning

    Hands-on teaching of modern machine learning and deep learning techniques heavily relies on the use of well-suited datasets. The "weather prediction dataset" is a novel tabular dataset that was specifically created for teaching machine learning and deep learning to an academic audience. The dataset contains intuitively accessible weather observations from 18 locations in Europe. It was designed to be suitable for a large variety of different training goals, many of which are not easily giving way to unrealistically high prediction accuracy. Teachers or instructors thus can chose the difficulty of the training goals and thereby match it with the respective learner audience or lesson objective. The compact size and complexity of the dataset make it possible to quickly train common machine learning and deep learning models on a standard laptop so that they can be used in live hands-on sessions.

    The dataset can be found in the `\dataset` folder and be downloaded from zenodo: https://doi.org/10.5281/zenodo.4980359

    References

    If you make use of this dataset, in particular if this is in form of an academic contribution, then please cite the following two references:

    • Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. of Climatol., 22, 1441-1453. Data and metadata available at http://www.ecad.eu
    • Florian Huber, Dafne van Kuppevelt, Peter Steinbach, Colin Sauze, Yang Liu, Berend Weel, "Will the sun shine? – An accessible dataset for teaching machine learning and deep learning", DOI TO BE ADDED!

    Map of the locations of the 18 weather stations from which data was collected

    Map of weather stations

  7. Date Fruit Datasets

    • kaggle.com
    zip
    Updated Apr 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat KOKLU (2022). Date Fruit Datasets [Dataset]. https://www.kaggle.com/datasets/muratkokludataset/date-fruit-datasets
    Explore at:
    zip(418144 bytes)Available download formats
    Dataset updated
    Apr 3, 2022
    Authors
    Murat KOKLU
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DATASET: https://www.muratkoklu.com/datasets/

    Citation Request : KOKLU, M., KURSUN, R., TASPINAR, Y. S., and CINAR, I. (2021). Classification of Date Fruits into Genetic Varieties Using Image Analysis. Mathematical Problems in Engineering, Vol.2021, Article ID: 4793293, DOI:10.1155/2021/4793293 https://www.hindawi.com/journals/mpe/2021/4793293/

    Abstract: A great number of fruits are grown around the world, each of which has various types. The factors that determine the type of fruit are the external appearance features such as color, length, diameter, and shape. The external appearance of the fruits is a major determinant of the fruit type. Determining the variety of fruits by looking at their external appearance may necessitate expertise, which is time-consuming and requires great effort. The aim of this study is to classify the types of date fruit, that are, Barhee, Deglet Nour, Sukkary, Rotab Mozafati, Ruthana, Safawi, and Sagai by using three different machine learning methods. In accordance with this purpose, 898 images of seven different date fruit types were obtained via the computer vision system (CVS). Through image processing techniques, a total of 34 features, including morphological features, shape, and color, were extracted from these images. First, models were developed by using the logistic regression (LR) and artificial neural network (ANN) methods, which are among the machine learning methods. Performance results achieved with these methods are 91.0% and 92.2%, respectively. Then, with the stacking model created by combining these models, the performance result was increased to 92.8%. It has been concluded that machine learning methods can be applied successfully for the classification of date fruit types.

  8. PVC-Infrared dataset for deep learning

    • kaggle.com
    zip
    Updated Dec 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziang Wei (2023). PVC-Infrared dataset for deep learning [Dataset]. https://www.kaggle.com/datasets/ziangwei/irtpvc
    Explore at:
    zip(3085055593 bytes)Available download formats
    Dataset updated
    Dec 25, 2023
    Authors
    Ziang Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for 19 PVC specimens with planted defects. The dataset is available for academic and research use only. Papers using this dataset are kindly requested to refer to the paper at https://www.preprints.org/manuscript/202301.0483/v1

  9. 2018 Kaggle Machine Learning Challenge dataset

    • kaggle.com
    zip
    Updated Nov 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sreenanda Sai Dasari (2021). 2018 Kaggle Machine Learning Challenge dataset [Dataset]. https://www.kaggle.com/datasets/sreenandasaidasari/2021-kaggle-machine-learning-challenge
    Explore at:
    zip(4127154 bytes)Available download formats
    Dataset updated
    Nov 28, 2021
    Authors
    Sreenanda Sai Dasari
    Description

    Dataset

    This dataset was created by Sreenanda Sai Dasari

    Contents

  10. DL Course Data

    • kaggle.com
    zip
    Updated Sep 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Holbrook (2020). DL Course Data [Dataset]. https://www.kaggle.com/ryanholbrook/dl-course-data
    Explore at:
    zip(242157006 bytes)Available download formats
    Dataset updated
    Sep 11, 2020
    Authors
    Ryan Holbrook
    Description

    Dataset

    This dataset was created by Ryan Holbrook

    Released under Other (specified in description)

    Contents

  11. Single Layer Perceptron Dataset(Small)

    • kaggle.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABIR HASAN 1703100 (2023). Single Layer Perceptron Dataset(Small) [Dataset]. https://www.kaggle.com/datasets/abirhasan1703100/single-layer-perceptron-datasetsmall
    Explore at:
    zip(349 bytes)Available download formats
    Dataset updated
    Apr 19, 2023
    Authors
    ABIR HASAN 1703100
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    We have chosen a simple numpy array to implement the single layer perceptron algorithm. We have considered a total of 13 samples with three features and one class label. The class label is defined in binary 0 and 1. The training dataset contains eight data samples, while the validation dataset contains five. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9905947%2F7dc95405d7b0696adeb1c90f1cf8682b%2Ftraining%20data.jpg?generation=1681929479850322&alt=media" alt=""> Fig 1.1: Train Data https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9905947%2Fe83b9677df9780414f25471c72ead9ca%2Ftest%20data.jpg?generation=1681929512768929&alt=media" alt=""> Fig 1.2: Test Data Here the first value for every sample is considered 1, as the algorithm says the value of x0 should always be 1. But even without this characteristic, our code will give the correct output.

  12. Deep Learning dataset

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sudheer (2023). Deep Learning dataset [Dataset]. https://www.kaggle.com/datasets/sudheere/deep-learning-dataset
    Explore at:
    zip(133366603 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    sudheer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by sudheer

    Released under Apache 2.0

    Contents

  13. Sentiment Analysis Deep Learning

    • kaggle.com
    zip
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitesh sureja (2024). Sentiment Analysis Deep Learning [Dataset]. https://www.kaggle.com/datasets/niteshsureja/sentiment-analysis-deep-learning
    Explore at:
    zip(5723155 bytes)Available download formats
    Dataset updated
    Jun 8, 2024
    Authors
    Nitesh sureja
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Nitesh sureja

    Released under Apache 2.0

    Contents

  14. Machine Learning Basics for Beginners🤖🧠

    • kaggle.com
    zip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhanupratap Biswas (2023). Machine Learning Basics for Beginners🤖🧠 [Dataset]. https://www.kaggle.com/datasets/bhanupratapbiswas/machine-learning-basics-for-beginners
    Explore at:
    zip(492015 bytes)Available download formats
    Dataset updated
    Jun 22, 2023
    Authors
    Bhanupratap Biswas
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Sure! I'd be happy to provide you with an introduction to machine learning basics for beginners. Machine learning is a subfield of artificial intelligence (AI) that focuses on enabling computers to learn and make predictions or decisions without being explicitly programmed. Here are some key concepts and terms to help you get started:

    1. Supervised Learning: In supervised learning, the machine learning algorithm learns from labeled training data. The training data consists of input examples and their corresponding correct output or target values. The algorithm learns to generalize from this data and make predictions or classify new, unseen examples.

    2. Unsupervised Learning: Unsupervised learning involves learning patterns and relationships from unlabeled data. Unlike supervised learning, there are no target values provided. Instead, the algorithm aims to discover inherent structures or clusters in the data.

    3. Training Data and Test Data: Machine learning models require a dataset to learn from. The dataset is typically split into two parts: the training data and the test data. The model learns from the training data, and the test data is used to evaluate its performance and generalization ability.

    4. Features and Labels: In supervised learning, the input examples are often represented by features or attributes. For example, in a spam email classification task, features might include the presence of certain keywords or the length of the email. The corresponding output or target values are called labels, indicating the class or category to which the example belongs (e.g., spam or not spam).

    5. Model Evaluation Metrics: To assess the performance of a machine learning model, various evaluation metrics are used. Common metrics include accuracy (the proportion of correctly predicted examples), precision (the proportion of true positives among all positive predictions), recall (the proportion of true positives predicted correctly), and F1 score (a combination of precision and recall).

    6. Overfitting and Underfitting: Overfitting occurs when a model becomes too complex and learns to memorize the training data instead of generalizing well to unseen examples. On the other hand, underfitting happens when a model is too simple and fails to capture the underlying patterns in the data. Balancing the complexity of the model is crucial to achieve good generalization.

    7. Feature Engineering: Feature engineering involves selecting or creating relevant features that can help improve the performance of a machine learning model. It often requires domain knowledge and creativity to transform raw data into a suitable representation that captures the important information.

    8. Bias and Variance Trade-off: The bias-variance trade-off is a fundamental concept in machine learning. Bias refers to the errors introduced by the model's assumptions and simplifications, while variance refers to the model's sensitivity to small fluctuations in the training data. Reducing bias may increase variance and vice versa. Finding the right balance is important for building a well-performing model.

    9. Supervised Learning Algorithms: There are various supervised learning algorithms, including linear regression, logistic regression, decision trees, random forests, support vector machines (SVM), and neural networks. Each algorithm has its own strengths, weaknesses, and specific use cases.

    10. Unsupervised Learning Algorithms: Unsupervised learning algorithms include clustering algorithms like k-means clustering and hierarchical clustering, dimensionality reduction techniques like principal component analysis (PCA) and t-SNE, and anomaly detection algorithms, among others.

    These concepts provide a starting point for understanding the basics of machine learning. As you delve deeper, you can explore more advanced topics such as deep learning, reinforcement learning, and natural language processing. Remember to practice hands-on with real-world datasets to gain practical experience and further refine your skills.

  15. Colour-Greyscale Dataset

    • kaggle.com
    zip
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohan Rao Eravelli (2024). Colour-Greyscale Dataset [Dataset]. https://www.kaggle.com/datasets/rohanraoeravelli/colour-greyscale-dataset
    Explore at:
    zip(194671456 bytes)Available download formats
    Dataset updated
    Jul 8, 2024
    Authors
    Rohan Rao Eravelli
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a valuable resource for exploring the fundamentals of color grading models in deep learning. Here's an expanded description highlighting its key features and potential applications:

    Structure:

    Two main folders: Cars and Flowers Each folder contains 400 images (200 color + 200 grayscale) This balanced representation facilitates model training and evaluation on both color and grayscale data. Applications:

    Learning Color Grading Concepts: The dataset's simplicity allows beginners to grasp the core principles of color grading models. By training models to transform grayscale images to their colored counterparts (for Cars and Flowers) and vice versa, users can understand how these models learn the relationships between color and grayscale representations. Experimentation with Model Architectures: The dataset's size is suitable for testing and comparing different deep learning architectures for color grading tasks. This exploration can help identify efficient models that achieve good results on a manageable dataset. Fine-tuning Pre-trained Models: This dataset can be used for fine-tuning pre-trained models like convolutional neural networks (CNNs) that have already learned general image processing features. Fine-tuning leverages these pre-trained weights and focuses on color-specific relationships within the Cars and Flowers domain. Benchmarking Performance: The dataset can serve as a benchmark for evaluating the performance of new color grading models. By comparing the accuracy of different models in converting grayscale images to their color counterparts, researchers can track progress in the field.

  16. Kaggle Top Datasets🚀📊

    • kaggle.com
    zip
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Frias (2024). Kaggle Top Datasets🚀📊 [Dataset]. https://www.kaggle.com/datasets/aaronfriasr/kaggle-top-datasets
    Explore at:
    zip(1572305 bytes)Available download formats
    Dataset updated
    Apr 10, 2024
    Authors
    Aaron Frias
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning

    Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset

    Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.

    Enjoy & Keep Learning !!!

  17. CIFAKE: Real and AI-Generated Synthetic Images

    • kaggle.com
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan J. Bird (2023). CIFAKE: Real and AI-Generated Synthetic Images [Dataset]. https://www.kaggle.com/datasets/birdy654/cifake-real-and-ai-generated-synthetic-images
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jordan J. Bird
    Description

    CIFAKE: Real and AI-Generated Synthetic Images

    The quality of AI-generated images has rapidly increased, leading to concerns of authenticity and trustworthiness.

    CIFAKE is a dataset that contains 60,000 synthetically-generated images and 60,000 real images (collected from CIFAR-10). Can computer vision techniques be used to detect when an image is real or has been generated by AI?

    Further information on this dataset can be found here: Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Dataset details

    The dataset contains two classes - REAL and FAKE.

    For REAL, we collected the images from Krizhevsky & Hinton's CIFAR-10 dataset

    For the FAKE images, we generated the equivalent of CIFAR-10 with Stable Diffusion version 1.4

    There are 100,000 images for training (50k per class) and 20,000 for testing (10k per class)

    Papers with Code

    The dataset and all studies using it are linked using Papers with Code https://paperswithcode.com/dataset/cifake-real-and-ai-generated-synthetic-images

    References

    If you use this dataset, you must cite the following sources

    Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

    Bird, J.J. and Lotfi, A., 2024. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access.

    Real images are from Krizhevsky & Hinton (2009), fake images are from Bird & Lotfi (2024). The Bird & Lotfi study is available here.

    Notes

    The updates to the dataset on the 28th of March 2023 did not change anything; the file formats ".jpeg" were renamed ".jpg" and the root folder was uploaded to meet Kaggle's usability requirements.

    License

    This dataset is published under the same MIT license as CIFAR-10:

    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  18. 2018 Kaggle Machine Learning & Data Science Survey

    • kaggle.com
    zip
    Updated Apr 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Solyoh21 (2020). 2018 Kaggle Machine Learning & Data Science Survey [Dataset]. https://www.kaggle.com/datasets/solyoh21/2018kaggle-machine-learning-data-science-survey
    Explore at:
    zip(4405270 bytes)Available download formats
    Dataset updated
    Apr 1, 2020
    Authors
    Solyoh21
    License

    https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en

    Description

    Dataset

    This dataset was created by Solyoh21

    Released under EU ODP Legal Notice

    Contents

  19. Flickr-Face-HQ and GenAI Dataset (FF-GenAI)

    • kaggle.com
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A_rgonaut (2025). Flickr-Face-HQ and GenAI Dataset (FF-GenAI) [Dataset]. https://www.kaggle.com/datasets/argonautex/flickr-face-hq-and-genai-dataset-ff-genai
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    A_rgonaut
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    The dataset consists of 100k JPG images (50k real and 50k- fake) at 224x224 resolution pre-processed and merged by the following links:

    This dataset is designed to support research at the intersection of computer vision and generative models. By combining high-quality real face images from the Flickr-Faces-HQ (FFHQ) dataset with AI-generated counterparts, this dataset provides a robust foundation for multiple advanced applications:

    GAN Training. With its high resolution and rich visual diversity, the dataset is ideal for training Generative Adversarial Networks (GANs), enabling models to learn realistic facial features across a wide range of demographics and conditions.

    Synthetic Content Detection. The inclusion of both real and generated images makes the dataset particularly suitable for developing and benchmarking algorithms aimed at detecting AI-generated content, a critical task in the age of deepfakes.

    Model Generalization Testing. The variety and complexity of the data offer a reliable benchmark for evaluating how well machine learning models generalize to unseen examples, contributing to the development of more robust and adaptable systems.

  20. Medical Image Segmentation-DeepLearning Project

    • kaggle.com
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SRIKANTH REDDY MARRI (2024). Medical Image Segmentation-DeepLearning Project [Dataset]. https://www.kaggle.com/datasets/srikanthreddymarri/medical-image-segmentation-deeplearning-project
    Explore at:
    zip(137405452 bytes)Available download formats
    Dataset updated
    Apr 17, 2024
    Authors
    SRIKANTH REDDY MARRI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by SRIKANTH REDDY MARRI

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
EMİRHAN BULUT (2022). Machine Learning Tutorials - Example Projects - AI [Dataset]. https://www.kaggle.com/datasets/emirhanai/machine-learning-tutorials-example-projects-ai
Organization logo

Machine Learning Tutorials - Example Projects - AI

Machine Learning Tutorials - Example Projects - AI

Explore at:
zip(1587192509 bytes)Available download formats
Dataset updated
Oct 20, 2022
Authors
EMİRHAN BULUT
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Machine Learning Tutorials - Example Projects - AI

I am sharing my 28 Machine Learning, Deep Learning (Artificial Intelligence - AI) projects with their data, software and outputs on Kaggle for educational purposes as open source. It appeals to people who want to work in this field, have 0 Machine Learning knowledge, have Intermediate Machine Learning knowledge, specialize in this field (Attracts to all levels). The deep learning projects in it are for advanced level, so I recommend you to start your studies from the Machine Learning section. You can check your own outputs along with the outputs in it. I am happy to share 28 educational projects with the whole world through Kaggle. Knowledge is free and better when shared!

Algorithms used in it:

1) Nearest Neighbor
2) Naive Bayes
3) Decision Trees
4) Linear Regression
5) Support Vector Machines (SVM)
6) Neural Networks
7) K-means clustering

Kind regards, Emirhan BULUT

You can use the links below for communication. If you have any questions or comments, feel free to let me know!

LinkedIn: https://www.linkedin.com/in/artificialintelligencebulut/ Email: emirhan@novosteer.com

Emirhan BULUT. (2022). Machine Learning Tutorials - Example Projects - AI [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/4361310

Search
Clear search
Close search
Google apps
Main menu