Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description
This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.
Components:
Dataset Origin
The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective
As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:
Research Questions
Tools and Technologies
I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.
Learning and Development
This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback
I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset provides grayscale pixel values for brain tumor MRI images, stored in a CSV format for simplified access and ease of use. The goal is to create a "MNIST-like" dataset for brain tumors, where each row in the CSV file represents the pixel values of a single image in its original resolution. This format makes it convenient for researchers and developers to quickly load and analyze MRI data for brain tumor detection, classification, and segmentation tasks without needing to handle large image files directly.
Brain tumor classification and segmentation are critical tasks in medical imaging, and datasets like these are valuable for developing and testing machine learning and deep learning models. While there are several publicly available brain tumor image datasets, they often consist of large image files that can be challenging to process. This CSV-based dataset addresses that by providing a compact and accessible format. Potential use cases include: - Tumor Classification: Identifying different types of brain tumors, such as glioma, meningioma, and pituitary tumors, or distinguishing between tumor and non-tumor images. - Tumor Segmentation: Applying pixel-level classification and segmentation techniques for tumor boundary detection. - Educational and Rapid Prototyping: Ideal for educational purposes or quick experimentation without requiring large image processing capabilities.
This dataset is structured as a single CSV file where each row represents an image, and each column represents a grayscale pixel value. The pixel values are stored as integers ranging from 0 (black) to 255 (white).
This dataset is intended for research and educational purposes only. Users are encouraged to cite and credit the original data sources if using this dataset in any publications or projects. This is a derived CSV version aimed to simplify access and usability for machine learning and data science applications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code for reproducing figures in published work.
High Power Laser Science and Engineering
https://doi.org/10.1017/hpl.2022.47
Code used various python packages including tensorflow.
Conda environment was created with (on 6th Jan 2022)
conda create --name tf tensorflow notebook tensorflow-probability pandas tqdm scikit-learn matplotlib seaborn protobuf opencv scipy scikit-image scikit-optimize Pillow PyAbel libclang flatbuffers gast --channel conda-forge
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROGRAM SUMMARY No. of lines in distributed program, including test data, etc.: 481 No. of bytes in distributed program, including test data, etc.: 14540.8 Distribution format: .py, .csv Programming language: Python Computer: Any workstation or laptop computer running TensorFlow, Google Colab, Anaconda, Jupyter, pandas, NumPy, Microsoft Azure and Alteryx. Operating system: Windows and Mac OS, Linux.
Nature of problem: Navier-Stokes equations are solved numerically in ANSYS Fluent using Reynolds stress model for turbulence. The simulated values of friction factor are validated with theoretical and experimental data obtained from literature. Artificial neural networks are then used for a prediction-based augmentation of friction factor. The capabilities of the neural networks is discussed, in regard to computational cost and domain limitations.
Solution method: The simulation data is obtained through Reynolds stress modelling of fluid flow through pipe. This data is augmented using the artificial neural network model that predicts within and without data domain.
Restrictions: The code used in this research is limited to smooth pipe bends, in which friction factor is analysed using a steady state incompressible fluid flow.
Runtime: The artificial neural network produces results within a span of 20 seconds for three-dimensional geometry, using the allocated free computational resources of Google Colaboratory cloud-based computing system.
Facebook
TwitterI needed a simple image dataset that I could use when trying different object detection algorithms for the first time. It had to be something that could be quickly understood and easily loaded. I didn't want spend a lot of time doing EDA or trying to remember how the data is structured. Moreover, I wanted to be able to clearly see when a model 's prediction was correct or when it had made a mistake. When working with chest x-ray images, for example, it takes an expert to know if a model's predictions are correct.
I found the Balloons dataset and simplified it. The original data is split into train and test sets and it has two json files that need to be parsed. In this new version, I copied all images into a single folder and replaced the json files with one csv file that can be easily loaded with Pandas.
The dataset consists of 74 jpg images and one csv file. Each image contains one or more balloons.
The csv file has five columns:
fname - The image file name.
height - The image height.
width - The image width.
num_balloons - The number of balloons on the image.
bbox - The coordinates of each bounding box on the image.
The coordinates of each bbox are stored in a dictionary. The format is as follows:
{"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}
Where xmin and ymin are the coordinates of the top left corner, and xmax and ymax are the coordinates of the bottom right corner.
Each entry in the bbox column is a list of dictionaries. For example, if an image has two ballons and hence two bounding boxes, the entry will be as follows:
[{"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}, {"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}]
When loaded into a Pandas dataframe all items in the bbox column are of type string. The strings can be converted to a python lists like this:
import ast
# convert each item in the bbox column from type str to type list
df['bbox'] = df['bbox'].apply(ast.literal_eval)
Many thanks to Waleed Abdulla who created this dataset.
The original dataset can be downloaded and unzipped using this code:
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
!unzip balloon_dataset.zip > /dev/null
Can you create an app that can look at an image and tell you: - how many balloons are on the image, and - what are the colours of those balloons.
This is something that could help blind people. To help you get started here's an example of a similar project .
In this blog post the dataset's creator mentions that the images were sourced from Flickr. All images have a "Commercial use & mods allowed" license.
Header image by andremsantana on Pixabay.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
| Label | Species Name | Image Count |
|---|---|---|
| 1 | American Goldfinch | 143 |
| 2 | Emperor Penguin | 139 |
| 3 | Downy Woodpecker | 137 |
| 4 | Flamingo | 132 |
| 5 | Carmine Bee-eater | 131 |
| 6 | Barn Owl | 129 |
📂 Dataset Highlights: * Total Images: 811 * Classes: 6 unique bird species * Balanced Labels: Nearly equal distribution across classes * Use Cases: Image classification, model benchmarking, transfer learning, educational projects, biodiversity analysis
🧠 Potential Applications: * Training deep learning models like CNNs for bird species recognition * Fine-tuning pre-trained models using a small and balanced dataset * Educational projects in ornithology and computer vision * Biodiversity and wildlife conservation tech solutions
🛠️ Suggested Tools: * Python (Pandas, NumPy, Matplotlib) * TensorFlow / PyTorch for model development * OpenCV for image preprocessing * Streamlit for creating interactive demos
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This repository contains three folders which contain either the data or the source code for the three main chapters (Chapter 3, 4, and 5) in the thesis. Those folders are 1) Dataset (Chapter 3): This file contains phonocardigrams signals (/PhysioNet2016) used in Chapter 3 and 4 as the upstream pretraining data. This is a public dataset. /SourceCode includes all the statistical analysis and visualization scripts for Chapter 3. Yaseen_dataset and PASCAL contain phonocardigrams signals with pathological features, Yaseen_dataset serves as the downstream finetuning dataset in Chapter 3, while PASCAL datasets serves as the secondary testing dataset in Chapter 3. 2) Dataset (Chapter 4): /SourceCode includes all the statistical analysis and visualization scripts for Chapter 4. 3) Dataset (Chapter 5): PAD-UFES-20_processed contains dermatology images processed from the PAD-UFES-20 dataset, which is a public dataset. The dataset is used in the Chapter 5. And /SourceCode includes all the statistical analysis and visualization scripts for Chapter 5.Several packges are mendatory to run the source code, including:Python > 3.6 (3.11 preferred), TensorFlow > 2.16, Keras > 3.3, NumPy > 1.26, Pandas > 2.2, SciPy > 1.13
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the codes to reproduce the results of "Time resolved micro-XRCT dataset of Enzymatically Induced Calcite Precipitation (EICP) in sintered glass bead columns", cf. https://doi.org/10.18419/darus-2227. The code takes "low-dose" images as an input where the images contain many artifacts and noise as a trade-off of a fast data acquisition (6 min / dataset while 3 hours / dataset ("high-dose") in normal configuration). These low quality images are able to be improved with the help of a pre-trained model. The pre-trained model provided in here is trained with pairs of "high-dose" and "low-dose" data of above mentioned EICP application. The examples of used training, input and output data can be also found in this dataset. Although we showed only limited examples in here, we would like to emphasize that the used workflow and codes can be further extended to general image enhancement applications. The code requires a Python version above 3.7.7 with packages such as tensorflow, kears, pandas, scipy, scikit, numpy and patchify libraries. For further details of operation, please refer to the readme.txt file.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This fMRI dataset was collected for the study "Informative neural representations of unseen contents during higher-order processing in human brains and deep artificial networks".
Code corresponding to the dataste: https://github.com/nmningmei/unconfeats
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides daily stock prices for all companies listed on the National Stock Exchange (NSE) of India. The data spans several years and includes essential trading information that can be used for various financial analyses, stock market research, and machine learning applications.
The dataset includes the following columns:
The data has been sourced using the Yahoo Finance API, providing a reliable and comprehensive view of stock performance over time.
This dataset is ideal for:
The dataset is available in CSV format, making it easy to load into data analysis and machine learning libraries such as pandas, scikit-learn, and TensorFlow.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
V1
I have created an artificial intelligence software that can make an emotion prediction based on the text you have written using the Semi Supervised Learning method and the RC algorithm. I used very simple codes and it was a software that focused on solving the problem. I aim to create the 2nd version of the software using RNN (Recurrent Neural Network). I hope I was able to create an example for you to use in your thesis and projects.
V2
I decided to apply a technique that I had developed in the emotion dataset that I had used Semi-Supervised learning in Machine Learning methods before. This technique is produced according to Quantum5 laws. I developed a smart artificial intelligence software that can predict emotion with Quantum5 neuronal networks. I share this software with all humanity as open source on Kaggle. It is my first open source project in NLP system with Quantum technology. Developing the NLP system with Quantum technology is very exciting!
Happy learning!
Emirhan BULUT
Head of AI and AI Inventor
Emirhan BULUT. (2022). Emotion Prediction with Quantum5 Neural Network AI [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/2129637
Python 3.9.8
Keras
Tensorflow
NumPy
Pandas
Scikit-learn (SKLEARN)
https://raw.githubusercontent.com/emirhanai/Emotion-Prediction-with-Semi-Supervised-Learning-of-Machine-Learning-Software-with-RC-Algorithm---By/main/Quantum%205.png" alt="Emotion Prediction with Quantum5 Neural Network on AI - Emirhan BULUT">
https://raw.githubusercontent.com/emirhanai/Emotion-Prediction-with-Semi-Supervised-Learning-of-Machine-Learning-Software-with-RC-Algorithm---By/main/Emotion%20Prediction%20with%20Semi%20Supervised%20Learning%20of%20Machine%20Learning%20Software%20with%20RC%20Algorithm%20-%20By%20Emirhan%20BULUT.png" alt="Emotion Prediction with Semi Supervised Learning of Machine Learning Software with RC Algorithm - Emirhan BULUT">
Name-Surname: Emirhan BULUT
Contact (Email) : emirhan@isap.solutions
LinkedIn : https://www.linkedin.com/in/artificialintelligencebulut/
Kaggle: https://www.kaggle.com/emirhanai
Official Website: https://www.emirhanbulut.com.tr
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This notebook focuses on predicting Air Quality Index (AQI) values by estimating Carbon Monoxide (CO) concentration using a Neural Network Regression Model trained on environmental pollutant data.
The model follows the EPA (Environmental Protection Agency) standard formula for converting CO concentration (in ppm) to AQI levels.
Data Preprocessing
MinMaxScalerModel Building (Neural Network)
Prediction Phase
AQI Calculation (EPA Standard)
Visualization
Air pollution is one of the most pressing global issues today.
By combining machine learning with environmental science, this notebook helps predict pollution levels and interpret air quality using AI-driven insights.
✅ Accurate CO prediction using neural network regression
✅ Dynamic AQI computation based on EPA standards
✅ Clear and intuitive visualizations
🚀 "AI can’t clean the air — but it can help us understand how bad it really is."
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
https://github.githubassets.com/images/modules/site/home/footer-illustration.svg" alt="GitHub">
Image credits: https://github.com
This is a dataset that contains all commit messages and its related metadata from 32 popular GitHub repositories. These repositories are:
Image credits: Unsplash - yancymin
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project aims to develop a model for identifying five different flower species (rose, tulip, sunflower, dandelion, daisy) using Convolutional Neural Networks (CNNs).
The dataset consists of 5,000 images (1,000 images per class) collected from various online sources. The model achieved an accuracy of 98.58% on the test set. Usage
TensorFlow: For making Neural Networks numpy: For numerical computing and array operations. pandas: For data manipulation and analysis. matplotlib: For creating visualizations such as line plots, bar plots, and histograms. seaborn: For advanced data visualization and creating statistically-informed graphics. scikit-learn: For machine learning algorithms and model training. To run the project:
Install the required libraries. Run the Jupyter Notebook: jupyter notebook flower_classification.ipynb Additional Information Link to code: https://github.com/Harshjaglan01/flower-classification-cnn License: MIT License
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description
This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.
Components:
Dataset Origin
The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective
As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:
Research Questions
Tools and Technologies
I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.
Learning and Development
This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback
I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.