https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12038776%2F5a9c101d1a2498a37406d3a91cebb66c%2Fpkx1jz0terhb9bm50stm.jpg?generation=1713517466786485&alt=media" alt="">
This project aims to develop a personalized course recommendation engine integrated with a Django web application, leveraging machine learning techniques. Utilizing a dataset from Udemy containing course information, the system analyzes user preferences and behaviors to provide tailored recommendations. The recommendation engine employs machine learning algorithms to predict courses that align with the user's interests based on input provided. This project demonstrates the significance of recommendation engines in enhancing user experience, increasing engagement, and driving revenue growth in the competitive digital landscape.
Dataset : * The dataset contains information on 3678 courses available on Udemy, spanning various subjects and levels of difficulty. Here's a description of the columns: * course_id: Unique identifier for each course. * course_title: Title of the course. * url: URL of the course. * is_paid: Boolean indicating whether the course is paid or not. * price: Price of the course. * num_subscribers: Number of subscribers enrolled in the course. * num_reviews: Number of reviews for the course. * num_lectures: Number of lectures in the course. * level: Difficulty level of the course (e.g., Beginner, Intermediate, Advanced). * content_duration: Duration of the course content. * published_timestamp: Timestamp indicating when the course was published. * subject: Subject category of the course. * This dataset provides comprehensive information about Udemy courses, including their popularity (measured by the number of subscribers and reviews), pricing, content duration, and level of difficulty. It covers a wide range of subjects, making it suitable for building a recommendation engine to suggest courses based on user preferences and interests.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Exercise: Machine Learning Competitions
When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!
Enjoy!
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
A BitTorrent file to download data with the title 'Udemy - Machine Learning A-Z Become Kaggle Master'
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Deep Learning A-Z - ANN dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/filippoo/deep-learning-az-ann on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This is the dataset used in the section "ANN (Artificial Neural Networks)" of the Udemy course from Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), called Deep Learning A-Z™: Hands-On Artificial Neural Networks. The dataset is very useful for beginners of Machine Learning, and a simple playground where to compare several techniques/skills.
It can be freely downloaded here: https://www.superdatascience.com/deep-learning/
The story: A bank is investigating a very high rate of customer leaving the bank. Here is a 10.000 records dataset to investigate and predict which of the customers are more likely to leave the bank soon.
The story of the story: I'd like to compare several techniques (better if not alone, and with the experience of several Kaggle users) to improve my basic knowledge on Machine Learning.
I will write more later, but the columns names are very self-explaining.
Udemy instructors Kirill Eremenko (Data Scientist & Forex Systems Expert) and Hadelin de Ponteves (Data Scientist), and their efforts to provide this dataset to their students.
Which methods score best with this dataset? Which are fastest (or, executable in a decent time)? Which are the basic steps with such a simple dataset, very useful to beginners?
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Working with MNIST with RBM and contamination.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
This dataset was created by Asma Abeyat
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description: This dataset contains images of 192 different scene categories, with both AI-generated and real-world images for each class. It is designed for research and benchmarking in computer vision, deep learning, and AI-generated image detection.
Key Features: 📸 192 Scene Classes: Includes diverse environments like forests, cities, beaches, deserts, and more. 🤖 AI-Generated vs. Real Images: Each class contains images generated by AI models as well as real-world photographs. 🖼️ High-Quality Images: The dataset ensures a variety of resolutions and sources to improve model generalization. 🏆 Perfect for Research: Ideal for training models in AI-generated image detection, scene classification, and image authenticity verification. Potential Use Cases: 🔍 AI-generated vs. real image classification 🏙️ Scene recognition and segmentation 🖥️ Training deep learning models for synthetic image detection 📊 Analyzing AI image generation trends
Dataset Card for "NNDL_HW5_S2025"
This is a dataset created for neural networks and deep learning course at University of Tehran. The original data can be accessed at https://www.kaggle.com/datasets/emmarex/plantdisease/data More Information needed
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Given dataset is multiclass classification problem also called as RGB classification .This dataset contain about 900 images for trainset dataset for each color classes and 100 images for the test set for each class.This dataset gives high accuracy as it contain every possible feature which a model should cover.Also it contains validation set of 100 images per class so that model will not overfit.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
We have chosen a simple numpy array to implement the single layer perceptron algorithm. We have considered a total of 13 samples with three features and one class label. The class label is defined in binary 0 and 1. The training dataset contains eight data samples, while the validation dataset contains five.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9905947%2F7dc95405d7b0696adeb1c90f1cf8682b%2Ftraining%20data.jpg?generation=1681929479850322&alt=media" alt="">
Fig 1.1: Train Data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9905947%2Fe83b9677df9780414f25471c72ead9ca%2Ftest%20data.jpg?generation=1681929512768929&alt=media" alt="">
Fig 1.2: Test Data
Here the first value for every sample is considered 1, as the algorithm says the value of x0 should always be 1. But even without this characteristic, our code will give the correct output.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Job Classification Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/HRAnalyticRepository/job-classification-dataset on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This is a dataset containing some fictional job class specs information. Typically job class specs have information which characterize the job class- its features, and a label- in this case a pay grade - something to predict that the features are related to.
The data is a static snapshot. The contents are ID column - a sequential number Job Family ID Job Family Description Job Class ID Job Class Description PayGrade- numeric Education Level Experience Organizational Impact Problem Solving Supervision Contact Level Financial Budget PG- Alpha label for PayGrade
This data is purely fictional
The intent is to use machine learning classification algorithms to predict PG from Educational level through to Financial budget information.
Typically job classification in HR is time consuming and cumbersome as a manual activity. The intent is to show how machine learning and People Analytics can be brought to bear on this task.
--- Original source retains full ownership of the source dataset ---
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This data set is used by Coursera Machine Learning Foundations course which is a part of Machine Learning Specialization. It contains. Among other columns, write about people. Perform hierarchical clustering on the data and try to group people in different broad categories.
Description:
The Animal Species Classification Dataset is meticulously design to support the development and training of machine learning models for multi-class image recognition tasks. The dataset encompasses a wide variety of animal species, making it an essential resource for projects focused on biodiversity, wildlife conservation, and zoological studies. Regular updates ensure that the dataset remains comprehensive, providing a diverse and evolving collection of animal images for accurate species classification.
Download Dataset
Dataset Composition:
The dataset is structured into six key directories, each serving a specific purpose within the machine learning pipeline:
Interesting Data:
• This directory contains 5 unique and challenging images per species class. These "interesting" images are selected to test the model's ability to make accurate predictions in complex scenarios. Evaluating model performance on these images offers insights into its understanding and classification capabilities.
Testing Data:
• A randomly populate directory with images from each species class, specifically curate for model testing. This dataset is essential for evaluating the performance and generalization of the model after it has been train.
TFRecords Data:
• This directory includes the dataset formatted as TensorFlow records. All images have been preprocessed, resized to 256x256 pixels, and normalized. These ready-to-use files facilitate seamless integration into TensorFlow-based machine learning workflows.
Train Augmented:
• To enhance model training, this directory contains augmented versions of the original training images. For each original image, 5 augmented variations are generated, resulting in a total of 10,000 images per species class. This augmentation strategy is crucial for increasing dataset size and diversity, which in turn helps the model learn more robust features.
Training Images:
• This directory is dedicated to the core training data, with each species class containing 2,000 images. The images have been uniformly resized to 256x256 pixels and normalized to a pixel range of 0 to 1. These images form the backbone of the dataset, enabling the model to learn distinguishing features for each species.
Validation Images:
• The validation directory contains 100 to 200 images per species class. These images are used during the training process to monitor the model's performance and adjust hyperparameters accordingly. By providing a separate validation set, the dataset ensures that the model's accuracy and reliability are rigorously evaluate.
Species Classes:
The dataset includes images from the following 15 animal species:
Beetle Butterfly Cat Cow Dog Elephant Gorilla Hippo Lizard Monkey Mouse Panda Spider Tiger Zebra
Each class is carefully curated to provide a balance and comprehensive representation of the species, making this dataset suitable for various image classification and recognition tasks.
Application:
This dataset is ideal for building machine learning models aim at classifying animal species. It serves as a valuable resource for academic research, conservation efforts, and applications in wildlife monitoring. By leveraging the diverse and augment images, models train on this dataset can achieve high accuracy and robustness in real-world classification tasks.
This dataset is sourced from Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Zoo Animal Classification’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/uciml/zoo-animal-classification on 28 January 2022.
--- Dataset description provided by original source is as follows ---
This dataset consists of 101 animals from a zoo. There are 16 variables with various traits to describe the animals. The 7 Class Types are: Mammal, Bird, Reptile, Fish, Amphibian, Bug and Invertebrate
The purpose for this dataset is to be able to predict the classification of the animals, based upon the variables. It is the perfect dataset for those who are new to learning Machine Learning.
Attribute Information: (name of attribute and type of value domain)
This csv describes the dataset
UCI Machine Learning: https://archive.ics.uci.edu/ml/datasets/Zoo
Source Information -- Creator: Richard Forsyth -- Donor: Richard S. Forsyth 8 Grosvenor Avenue Mapperley Park Nottingham NG3 5DX 0602-621676 -- Date: 5/15/1990
What are the best machine learning ensembles/methods for classifying these animals based upon the variables given?
--- Original source retains full ownership of the source dataset ---
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Tea Leaf Disease Detection
Format of image: JPG Number of Class: 7 Number of images: 5276
Number of images according to the class: - Tea algal leaf spot: 418 - Brown Blight: 508 - Gray Blight: 1013 - Helopolis: 607 - Red spider: 515 - Green mirid bug: 1282 - Healthy Leaf: 935
Mendeley Data: https://data.mendeley.com/datasets/744vznw5k2/4
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is prepared for multi-class classification problems, with optimized labels converted into binary (0/1) values for better model performance. It is ideal for machine learning and deep learning tasks, providing structured and cleaned data.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains 2062 images of teeth, divided into two classes: Cavity and Non-Cavity. Each class has 1031 images, providing a balanced dataset for training and testing machine learning models.
The images in this dataset are intended for the development and evaluation of algorithms for detecting dental cavities. The dataset can be used for various applications, such as:
Dataset Details:
Data Distribution:
The dataset is divided into two classes:
Data Preprocessing:
The images in this dataset have not been preprocessed or augmented. However, users may choose to apply preprocessing techniques, such as normalization, resizing, or data augmentation, to improve the performance of their models.
Usage:
This dataset is intended for non-commercial use only. Users are free to download, share, and modify the dataset, provided that they acknowledge the source and do not use it for commercial purposes.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The "Sen-2 LULC Dataset" is a collection of 2,13,750+ pre-processed 10 m resolution images representing 7 distinct classes of Land Use Land Cover. The 7 classes are water, Dense forest, Sparse forest, Barren land, Built up, Agriculture land and Fallow land. Multiple classes are present in the single image of the dataset. The Sentinel-2 images of Central India are taken from Copernicus Open Access Hub (https://scihub.copernicus.eu/) with cloud clover percentage ranging from 0 to 0.5%. The images are combination of bands B4, B3 and B2 constituting the red, green and blue bands with spectral resolution of 10m. The images are taken within the months of February and March 2021. The images used in the dataset belongs to Sentinel-2 Level-2A product (https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/product-types/level-2a#:~:text=The%20Level%2D2A%20product%20provides,(UTM%2FWGS84%20projection).). The dataset contains equal number of mask images. The dataset contains 6 folders with train, test and validate images and train, test and validate masks. This dataset can be used for Land Use Land Cover Classification (LULC) of Indian region to build the deep learning models. This dataset is beneficial for LULC classification research. [The related article is available at: Sen-2 LULC: Land use land cover dataset for deep learning approaches. Cite the article as : Sawant, S., Garg, R. D., Meshram, V., & Mistry, S. (2023). Sen-2 LULC: Land use land cover dataset for deep learning approaches. Data in Brief, 51, 109724, https://doi.org/10.1016/j.dib.2023.109724. ]
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The MCND dataset incorporates MRI data from three neurological disorders, released on the Kaggle repository. These include Alzheimer’s Disease (AD) [1], Brain Tumor (BT) [2], and Multiple Sclerosis (MS) [3]. This dataset contains 16400 images of human brain MRI images which are classified into 8 classes: AD-MildDemented, AD-ModerateDemented, AD-VeryMildDemented, BT-glioma, BT-meningioma, BT-pituitary, MS, and Normal (healthy).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
While studying neural networks in machine learning, I found an ingenious 2-D scatter pattern at the cn231 course by Andrej Karpathy. Decision boundaries for the three classes of points cannot be straight lines. He uses it to compare the behavior of a linear classifier and a neural network classifier. Often, we are content with the percentage of accuracy of our prediction or classification algorithm but a visualization tool helps our intuition about the trends and behavior of the classifier. It was definitely useful to catch something strange in the last example of this description. How does your preferred method or algorithm do with this input?
https://github.com/pliptor/PicoNN/raw/master/extras/input.png" alt="spiral">
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12038776%2F5a9c101d1a2498a37406d3a91cebb66c%2Fpkx1jz0terhb9bm50stm.jpg?generation=1713517466786485&alt=media" alt="">
This project aims to develop a personalized course recommendation engine integrated with a Django web application, leveraging machine learning techniques. Utilizing a dataset from Udemy containing course information, the system analyzes user preferences and behaviors to provide tailored recommendations. The recommendation engine employs machine learning algorithms to predict courses that align with the user's interests based on input provided. This project demonstrates the significance of recommendation engines in enhancing user experience, increasing engagement, and driving revenue growth in the competitive digital landscape.
Dataset : * The dataset contains information on 3678 courses available on Udemy, spanning various subjects and levels of difficulty. Here's a description of the columns: * course_id: Unique identifier for each course. * course_title: Title of the course. * url: URL of the course. * is_paid: Boolean indicating whether the course is paid or not. * price: Price of the course. * num_subscribers: Number of subscribers enrolled in the course. * num_reviews: Number of reviews for the course. * num_lectures: Number of lectures in the course. * level: Difficulty level of the course (e.g., Beginner, Intermediate, Advanced). * content_duration: Duration of the course content. * published_timestamp: Timestamp indicating when the course was published. * subject: Subject category of the course. * This dataset provides comprehensive information about Udemy courses, including their popularity (measured by the number of subscribers and reviews), pricing, content duration, and level of difficulty. It covers a wide range of subjects, making it suitable for building a recommendation engine to suggest courses based on user preferences and interests.