14 datasets found
  1. NASA Mars Rover

    • kaggle.com
    zip
    Updated Oct 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kush Tripathi (2023). NASA Mars Rover [Dataset]. https://www.kaggle.com/datasets/kushtripathi/nasa-mars-rover-captured-images-and-its-details
    Explore at:
    zip(101585155 bytes)Available download formats
    Dataset updated
    Oct 8, 2023
    Authors
    Kush Tripathi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description

    This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.

    Components:

    • Photos: A curated set of high-definition images taken by different cameras onboard Mars rovers. These images capture a variety of terrains, weather conditions, and other Martian phenomena.
    • Details: A detailed CSV file accompanies these images, containing rich metadata like the type of camera used, the corresponding Martian sol, Earth date, and the rover responsible for each image.

    Dataset Origin

    The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective

    As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:

    • Data Analysis: To perform exploratory data analysis (EDA) to understand the distribution of images based on attributes like camera type, date, and rover.
    • Color Analysis: To identify and visualize dominant colors across different sets of images. This could provide insights into Martian geology.
    • Texture and Pattern Recognition: To classify Martian terrains using texture and pattern recognition techniques.
    • Machine Learning: To potentially develop a predictive model that could classify images into predefined categories based on their features.

    Research Questions

    1. Which camera types have contributed the most to the dataset?
    2. What can the dominant colors in the images tell us about Mars?
    3. Can we classify Martian terrains into categories like rocky, sandy, and icy?
    4. Is there a correlation between the type of terrain and other variables like camera type or date?

    Tools and Technologies

    I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.

    Learning and Development

    This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback

    I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.

  2. Brain Tumor CSV

    • kaggle.com
    zip
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Nath (2024). Brain Tumor CSV [Dataset]. https://www.kaggle.com/datasets/akashnath29/brain-tumor-csv/code
    Explore at:
    zip(538175483 bytes)Available download formats
    Dataset updated
    Oct 30, 2024
    Authors
    Akash Nath
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset provides grayscale pixel values for brain tumor MRI images, stored in a CSV format for simplified access and ease of use. The goal is to create a "MNIST-like" dataset for brain tumors, where each row in the CSV file represents the pixel values of a single image in its original resolution. This format makes it convenient for researchers and developers to quickly load and analyze MRI data for brain tumor detection, classification, and segmentation tasks without needing to handle large image files directly.

    Motivation and Use Cases

    Brain tumor classification and segmentation are critical tasks in medical imaging, and datasets like these are valuable for developing and testing machine learning and deep learning models. While there are several publicly available brain tumor image datasets, they often consist of large image files that can be challenging to process. This CSV-based dataset addresses that by providing a compact and accessible format. Potential use cases include: - Tumor Classification: Identifying different types of brain tumors, such as glioma, meningioma, and pituitary tumors, or distinguishing between tumor and non-tumor images. - Tumor Segmentation: Applying pixel-level classification and segmentation techniques for tumor boundary detection. - Educational and Rapid Prototyping: Ideal for educational purposes or quick experimentation without requiring large image processing capabilities.

    Data Structure

    This dataset is structured as a single CSV file where each row represents an image, and each column represents a grayscale pixel value. The pixel values are stored as integers ranging from 0 (black) to 255 (white).

    CSV File Contents

    • Pixel Values: Each row contains the pixel values of a single grayscale image, flattened into a 1-dimensional array. The original image dimensions vary, and rows in the CSV will correspondingly vary in length.
    • Simplified Access: By using a CSV format, this dataset avoids the need for specialized image processing libraries and can be easily loaded into data analysis and machine learning frameworks like Pandas, Scikit-Learn, and TensorFlow.

    How to Use This Dataset

    1. Loading the Data: The CSV can be loaded using standard data analysis libraries, making it compatible with Python, R, and other platforms.
    2. Data Preprocessing: Users may normalize pixel values (e.g., between 0 and 1) for deep learning applications.
    3. Splitting Data: While this dataset does not predefine training and testing splits, users can separate rows into training, validation, and test sets.
    4. Reshaping for Models: If needed, each row can be reshaped to the original dimensions (retrieved from the subfolder structure) to view or process as an image.

    Technical Details

    • Image Format: Grayscale MRI images, with pixel values ranging from 0 to 255.
    • Resolution: Original resolution, no resizing applied.
    • Size: Each row’s length varies according to the original dimensions of each MRI image.
    • Data Type: CSV file with integer pixel values.

    Acknowledgments

    This dataset is intended for research and educational purposes only. Users are encouraged to cite and credit the original data sources if using this dataset in any publications or projects. This is a derived CSV version aimed to simplify access and usability for machine learning and data science applications.

  3. Code and dataset for publication "Laser Wakefield Accelerator modelling with...

    • zenodo.org
    zip
    Updated Jan 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. J. V. Streeter; M. J. V. Streeter (2023). Code and dataset for publication "Laser Wakefield Accelerator modelling with Variational Neural Networks" [Dataset]. http://doi.org/10.5281/zenodo.7510352
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 8, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    M. J. V. Streeter; M. J. V. Streeter
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code for reproducing figures in published work.

    High Power Laser Science and Engineering

    https://doi.org/10.1017/hpl.2022.47

    Code used various python packages including tensorflow.

    Conda environment was created with (on 6th Jan 2022)
    conda create --name tf tensorflow notebook tensorflow-probability pandas tqdm scikit-learn matplotlib seaborn protobuf opencv scipy scikit-image scikit-optimize Pillow PyAbel libclang flatbuffers gast --channel conda-forge

  4. m

    Neural Networks in Friction Factor Analysis of Smooth Pipe Bends

    • data.mendeley.com
    Updated Dec 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adarsh Vasa (2022). Neural Networks in Friction Factor Analysis of Smooth Pipe Bends [Dataset]. http://doi.org/10.17632/sjvbwh5ckg.1
    Explore at:
    Dataset updated
    Dec 19, 2022
    Authors
    Adarsh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROGRAM SUMMARY No. of lines in distributed program, including test data, etc.: 481 No. of bytes in distributed program, including test data, etc.: 14540.8 Distribution format: .py, .csv Programming language: Python Computer: Any workstation or laptop computer running TensorFlow, Google Colab, Anaconda, Jupyter, pandas, NumPy, Microsoft Azure and Alteryx. Operating system: Windows and Mac OS, Linux.

    Nature of problem: Navier-Stokes equations are solved numerically in ANSYS Fluent using Reynolds stress model for turbulence. The simulated values of friction factor are validated with theoretical and experimental data obtained from literature. Artificial neural networks are then used for a prediction-based augmentation of friction factor. The capabilities of the neural networks is discussed, in regard to computational cost and domain limitations.

    Solution method: The simulation data is obtained through Reynolds stress modelling of fluid flow through pipe. This data is augmented using the artificial neural network model that predicts within and without data domain.

    Restrictions: The code used in this research is limited to smooth pipe bends, in which friction factor is analysed using a steady state incompressible fluid flow.

    Runtime: The artificial neural network produces results within a span of 20 seconds for three-dimensional geometry, using the allocated free computational resources of Google Colaboratory cloud-based computing system.

  5. V2 Balloon Detection Dataset

    • kaggle.com
    zip
    Updated Jul 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vbookshelf (2022). V2 Balloon Detection Dataset [Dataset]. https://www.kaggle.com/vbookshelf/v2-balloon-detection-dataset
    Explore at:
    zip(49788043 bytes)Available download formats
    Dataset updated
    Jul 7, 2022
    Authors
    vbookshelf
    Description

    Context

    I needed a simple image dataset that I could use when trying different object detection algorithms for the first time. It had to be something that could be quickly understood and easily loaded. I didn't want spend a lot of time doing EDA or trying to remember how the data is structured. Moreover, I wanted to be able to clearly see when a model 's prediction was correct or when it had made a mistake. When working with chest x-ray images, for example, it takes an expert to know if a model's predictions are correct.

    I found the Balloons dataset and simplified it. The original data is split into train and test sets and it has two json files that need to be parsed. In this new version, I copied all images into a single folder and replaced the json files with one csv file that can be easily loaded with Pandas.

    Content

    The dataset consists of 74 jpg images and one csv file. Each image contains one or more balloons.

    The csv file has five columns:

    fname - The image file name.
    height - The image height.
    width - The image width.
    num_balloons - The number of balloons on the image.
    bbox - The coordinates of each bounding box on the image.
    

    The coordinates of each bbox are stored in a dictionary. The format is as follows:

    {"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}
    
    Where xmin and ymin are the coordinates of the top left corner, and xmax and ymax are the coordinates of the bottom right corner.
    

    Each entry in the bbox column is a list of dictionaries. For example, if an image has two ballons and hence two bounding boxes, the entry will be as follows:

    [{"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}, {"xmin": 100, "ymin": 100, "xmax": 300, "ymax": 300}]

    When loaded into a Pandas dataframe all items in the bbox column are of type string. The strings can be converted to a python lists like this:

    import ast
    
    # convert each item in the bbox column from type str to type list
    df['bbox'] = df['bbox'].apply(ast.literal_eval)
    
    

    Acknowledgements

    Many thanks to Waleed Abdulla who created this dataset.

    The original dataset can be downloaded and unzipped using this code:

    !wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
    !unzip balloon_dataset.zip > /dev/null
    

    Inspiration

    Can you create an app that can look at an image and tell you: - how many balloons are on the image, and - what are the colours of those balloons.

    This is something that could help blind people. To help you get started here's an example of a similar project .

    License

    In this blog post the dataset's creator mentions that the images were sourced from Flickr. All images have a "Commercial use & mods allowed" license.



    Header image by andremsantana on Pixabay.

  6. Bird Species Image Classification Dataset

    • kaggle.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evil Spirit05 (2025). Bird Species Image Classification Dataset [Dataset]. https://www.kaggle.com/datasets/evilspirit05/birds-species-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Evil Spirit05
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains high-quality images of six distinct bird species, curated for use in image classification, computer vision, and biodiversity research tasks. Each bird species included in this dataset is well-represented, making it ideal for training and evaluating deep learning models.

    LabelSpecies NameImage Count
    1American Goldfinch143
    2Emperor Penguin139
    3Downy Woodpecker137
    4Flamingo132
    5Carmine Bee-eater131
    6Barn Owl129

    📂 Dataset Highlights: * Total Images: 811 * Classes: 6 unique bird species * Balanced Labels: Nearly equal distribution across classes * Use Cases: Image classification, model benchmarking, transfer learning, educational projects, biodiversity analysis

    🧠 Potential Applications: * Training deep learning models like CNNs for bird species recognition * Fine-tuning pre-trained models using a small and balanced dataset * Educational projects in ornithology and computer vision * Biodiversity and wildlife conservation tech solutions

    🛠️ Suggested Tools: * Python (Pandas, NumPy, Matplotlib) * TensorFlow / PyTorch for model development * OpenCV for image preprocessing * Streamlit for creating interactive demos

  7. h

    Supporting data for “Deep learning methods and applications to digital...

    • datahub.hku.hk
    Updated Oct 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shichao Ma (2024). Supporting data for “Deep learning methods and applications to digital health” [Dataset]. http://doi.org/10.25442/hku.27060427.v1
    Explore at:
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    HKU Data Repository
    Authors
    Shichao Ma
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This repository contains three folders which contain either the data or the source code for the three main chapters (Chapter 3, 4, and 5) in the thesis. Those folders are 1) Dataset (Chapter 3): This file contains phonocardigrams signals (/PhysioNet2016) used in Chapter 3 and 4 as the upstream pretraining data. This is a public dataset. /SourceCode includes all the statistical analysis and visualization scripts for Chapter 3. Yaseen_dataset and PASCAL contain phonocardigrams signals with pathological features, Yaseen_dataset serves as the downstream finetuning dataset in Chapter 3, while PASCAL datasets serves as the secondary testing dataset in Chapter 3. 2) Dataset (Chapter 4): /SourceCode includes all the statistical analysis and visualization scripts for Chapter 4. 3) Dataset (Chapter 5): PAD-UFES-20_processed contains dermatology images processed from the PAD-UFES-20 dataset, which is a public dataset. The dataset is used in the Chapter 5. And /SourceCode includes all the statistical analysis and visualization scripts for Chapter 5.Several packges are mendatory to run the source code, including:Python > 3.6 (3.11 preferred), TensorFlow > 2.16, Keras > 3.3, NumPy > 1.26, Pandas > 2.2, SciPy > 1.13

  8. D

    Image enhancement code: time-resolved tomograms of EICP application using 3D...

    • darus.uni-stuttgart.de
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongwon Lee; Holger Steeb (2023). Image enhancement code: time-resolved tomograms of EICP application using 3D U-net [Dataset]. http://doi.org/10.18419/DARUS-2991
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 7, 2023
    Dataset provided by
    DaRUS
    Authors
    Dongwon Lee; Holger Steeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    This dataset contains the codes to reproduce the results of "Time resolved micro-XRCT dataset of Enzymatically Induced Calcite Precipitation (EICP) in sintered glass bead columns", cf. https://doi.org/10.18419/darus-2227. The code takes "low-dose" images as an input where the images contain many artifacts and noise as a trade-off of a fast data acquisition (6 min / dataset while 3 hours / dataset ("high-dose") in normal configuration). These low quality images are able to be improved with the help of a pre-trained model. The pre-trained model provided in here is trained with pairs of "high-dose" and "low-dose" data of above mentioned EICP application. The examples of used training, input and output data can be also found in this dataset. Although we showed only limited examples in here, we would like to emphasize that the used workflow and codes can be further extended to general image enhancement applications. The code requires a Python version above 3.7.7 with packages such as tensorflow, kears, pandas, scipy, scikit, numpy and patchify libraries. For further details of operation, please refer to the readme.txt file.

  9. Data from: Informative neural representations of unseen contents during...

    • openneuro.org
    Updated Dec 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ning Mei; Roberto Santana; David Soto (2021). Informative neural representations of unseen contents during higher-order processing in human brains and deep artificial networks [Dataset]. http://doi.org/10.18112/openneuro.ds003927.v1.0.1
    Explore at:
    Dataset updated
    Dec 10, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Ning Mei; Roberto Santana; David Soto
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This fMRI dataset was collected for the study "Informative neural representations of unseen contents during higher-order processing in human brains and deep artificial networks".

    Code corresponding to the dataste: https://github.com/nmningmei/unconfeats

    System Information

    • Platform: Linux-3.10.0-514.el7.x86_64-x86_64-with-centos-7.3.1611-Core
    • CPU: x86_64: 16 cores

    Python environment

    • Python: 3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42) [GCC 7.2.0]
    • Numpy: 1.19.1
    • Scipy: 1.3.1
    • Matplotlib: 3.1.3
    • Scikit-learn: 0.24.2
    • Seaborn: 0.11.1
    • Pandas: 1.0.1
    • Tensorflow: 2.0.0
    • Pytorch: 1.7.1
    • Nilearn: 0.7.1
    • Nipype: 1.4.2
    • LegrandNico/metadPy ## R environment - R base
    • R: 4.0.3 # for 3-way repeated measure ANOVAs ## Brain image processing backends
    • mricrogl
    • mricron: 10.2014
    • FSL: 6.0.0
    • Freesurfer: 6.0.0
  10. Historical Data of Stocks Listed on NSE

    • kaggle.com
    zip
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sampath Gudibettumane (2024). Historical Data of Stocks Listed on NSE [Dataset]. https://www.kaggle.com/datasets/paramamithra/historical-data-of-stocks-listed-on-nse
    Explore at:
    zip(22 bytes)Available download formats
    Dataset updated
    Dec 23, 2024
    Authors
    Sampath Gudibettumane
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This dataset provides daily stock prices for all companies listed on the National Stock Exchange (NSE) of India. The data spans several years and includes essential trading information that can be used for various financial analyses, stock market research, and machine learning applications.

    Content

    The dataset includes the following columns:

    • Date: The date of the trading day in YYYY-MM-DD format.
    • Open: The opening price of the stock on the given date.
    • High: The highest price of the stock on the given date.
    • Low: The lowest price of the stock on the given date.
    • Close: The closing price of the stock on the given date.
    • Adj Close: The adjusted closing price of the stock on the given date, which accounts for dividends, stock splits, and other corporate actions.
    • Volume: The number of shares traded on the given date.
    • Symbol: The unique ticker symbol of the stock.

    Data Source

    The data has been sourced using the Yahoo Finance API, providing a reliable and comprehensive view of stock performance over time.

    Usage

    This dataset is ideal for:

    • Time series analysis and forecasting of stock prices.
    • Developing and testing trading algorithms.
    • Financial market research and trend analysis.
    • Machine learning projects related to finance and economics.

    File Format

    The dataset is available in CSV format, making it easy to load into data analysis and machine learning libraries such as pandas, scikit-learn, and TensorFlow.

  11. Emotion Prediction with Quantum5 Neural Network AI

    • kaggle.com
    zip
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EMİRHAN BULUT (2025). Emotion Prediction with Quantum5 Neural Network AI [Dataset]. https://www.kaggle.com/datasets/emirhanai/emotion-prediction-with-semi-supervised-learning
    Explore at:
    zip(2332683 bytes)Available download formats
    Dataset updated
    Oct 19, 2025
    Authors
    EMİRHAN BULUT
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Emotion Prediction with Quantum5 Neural Network AI Machine Learning - By Emirhan BULUT

    V1

    I have created an artificial intelligence software that can make an emotion prediction based on the text you have written using the Semi Supervised Learning method and the RC algorithm. I used very simple codes and it was a software that focused on solving the problem. I aim to create the 2nd version of the software using RNN (Recurrent Neural Network). I hope I was able to create an example for you to use in your thesis and projects.

    V2

    I decided to apply a technique that I had developed in the emotion dataset that I had used Semi-Supervised learning in Machine Learning methods before. This technique is produced according to Quantum5 laws. I developed a smart artificial intelligence software that can predict emotion with Quantum5 neuronal networks. I share this software with all humanity as open source on Kaggle. It is my first open source project in NLP system with Quantum technology. Developing the NLP system with Quantum technology is very exciting!

    Happy learning!

    Emirhan BULUT

    Head of AI and AI Inventor

    Emirhan BULUT. (2022). Emotion Prediction with Quantum5 Neural Network AI [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/2129637

    The coding language used:

    Python 3.9.8

    Libraries Used:

    Keras

    Tensorflow

    NumPy

    Pandas

    Scikit-learn (SKLEARN)

    https://raw.githubusercontent.com/emirhanai/Emotion-Prediction-with-Semi-Supervised-Learning-of-Machine-Learning-Software-with-RC-Algorithm---By/main/Quantum%205.png" alt="Emotion Prediction with Quantum5 Neural Network on AI - Emirhan BULUT">

    https://raw.githubusercontent.com/emirhanai/Emotion-Prediction-with-Semi-Supervised-Learning-of-Machine-Learning-Software-with-RC-Algorithm---By/main/Emotion%20Prediction%20with%20Semi%20Supervised%20Learning%20of%20Machine%20Learning%20Software%20with%20RC%20Algorithm%20-%20By%20Emirhan%20BULUT.png" alt="Emotion Prediction with Semi Supervised Learning of Machine Learning Software with RC Algorithm - Emirhan BULUT">

    Developer Information:

    Name-Surname: Emirhan BULUT

    Contact (Email) : emirhan@isap.solutions

    LinkedIn : https://www.linkedin.com/in/artificialintelligencebulut/

    Kaggle: https://www.kaggle.com/emirhanai

    Official Website: https://www.emirhanbulut.com.tr

  12. Air Quality Index Prediction using Neural Networks

    • kaggle.com
    zip
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moiz Azad (2025). Air Quality Index Prediction using Neural Networks [Dataset]. https://www.kaggle.com/datasets/moizkhan00/air-quality-index-prediction-using-neural-networks
    Explore at:
    zip(1290288 bytes)Available download formats
    Dataset updated
    Oct 27, 2025
    Authors
    Moiz Azad
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🌍 Air Quality Index (AQI) Prediction using Neural Networks

    This notebook focuses on predicting Air Quality Index (AQI) values by estimating Carbon Monoxide (CO) concentration using a Neural Network Regression Model trained on environmental pollutant data.

    The model follows the EPA (Environmental Protection Agency) standard formula for converting CO concentration (in ppm) to AQI levels.

    ⚙️ Workflow Overview

    1. Data Preprocessing

      • Cleaned and normalized the dataset
      • Removed date/time and irrelevant columns
      • Scaled input and output features using MinMaxScaler
    2. Model Building (Neural Network)

      • Built a deep regression model using TensorFlow/Keras
      • Activation: ReLU
      • Optimizer: Adam
      • Loss: Mean Squared Error (MSE)
    3. Prediction Phase

      • Model predicts CO concentration based on given input features
      • Predictions are inverse-transformed to get real-world ppm values
    4. AQI Calculation (EPA Standard)

      • AQI computed using the official EPA breakpoint formula
      • Converts CO ppm into an AQI score ranging from 0–500
    5. Visualization

      • Distribution of pollutants
      • Correlation heatmap
      • Comparison of Predicted CO vs AQI Levels
      • AQI Category visualization

    🧠 Why This Project?

    Air pollution is one of the most pressing global issues today.
    By combining machine learning with environmental science, this notebook helps predict pollution levels and interpret air quality using AI-driven insights.

    📊 Tech Stack

    • Python
    • TensorFlow / Keras
    • NumPy, Pandas, Matplotlib, Seaborn
    • Scikit-learn

    🏁 Results

    ✅ Accurate CO prediction using neural network regression
    ✅ Dynamic AQI computation based on EPA standards
    ✅ Clear and intuitive visualizations

    🚀 "AI can’t clean the air — but it can help us understand how bad it really is."

  13. GitHub Commit Messages Dataset

    • kaggle.com
    zip
    Updated Mar 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruvil Dave (2021). GitHub Commit Messages Dataset [Dataset]. https://www.kaggle.com/dsv/1988456
    Explore at:
    zip(561489165 bytes)Available download formats
    Dataset updated
    Mar 2, 2021
    Authors
    Dhruvil Dave
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    https://github.githubassets.com/images/modules/site/home/footer-illustration.svg" alt="GitHub">

    Image credits: https://github.com

    Introduction

    This is a dataset that contains all commit messages and its related metadata from 32 popular GitHub repositories. These repositories are:

    • tensorflow/tensorflow
    • pytorch/pytorch
    • torvalds/linux
    • python/cpython
    • rust-lang/rust
    • microsoft/TypeScript
    • microsoft/vscode
    • golang/go
    • numpy/numpy
    • scikit-learn/scikit-learn
    • openbsd/src
    • freebsd/freebsd-src
    • pandas-dev/pandas
    • scipy/scipy
    • tidyverse/ggplot2
    • kubernetes/kubernetes
    • postgres/postgres
    • nodejs/node
    • facebook/react
    • angular/angular
    • matplotlib/matplotlib
    • apache/httpd
    • nginx/nginx
    • opencv/opencv
    • ipython/ipython
    • rstudio/rstudio
    • jupyterlab/jupyterlab
    • gcc-mirror/gcc
    • apple/swift
    • denoland/deno
    • apache/spark
    • llvm/llvm-project

    Credits

    Image credits: Unsplash - yancymin

  14. Image Classification by CNN

    • kaggle.com
    zip
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Jaglan (2024). Image Classification by CNN [Dataset]. https://www.kaggle.com/datasets/harshjaglan01/image-classification-by-cnn/code
    Explore at:
    zip(311627190 bytes)Available download formats
    Dataset updated
    Mar 4, 2024
    Authors
    Harsh Jaglan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Automated Flower Identification Using Convolutional Neural Networks

    This project aims to develop a model for identifying five different flower species (rose, tulip, sunflower, dandelion, daisy) using Convolutional Neural Networks (CNNs).

    Description

    The dataset consists of 5,000 images (1,000 images per class) collected from various online sources. The model achieved an accuracy of 98.58% on the test set. Usage

    This project requires Python 3.x and the following libraries:

    TensorFlow: For making Neural Networks numpy: For numerical computing and array operations. pandas: For data manipulation and analysis. matplotlib: For creating visualizations such as line plots, bar plots, and histograms. seaborn: For advanced data visualization and creating statistically-informed graphics. scikit-learn: For machine learning algorithms and model training. To run the project:

    Clone this repository.

    Install the required libraries. Run the Jupyter Notebook: jupyter notebook flower_classification.ipynb Additional Information Link to code: https://github.com/Harshjaglan01/flower-classification-cnn License: MIT License

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kush Tripathi (2023). NASA Mars Rover [Dataset]. https://www.kaggle.com/datasets/kushtripathi/nasa-mars-rover-captured-images-and-its-details
Organization logo

NASA Mars Rover

A Comprehensive Dataset of Rover Photos and Metadata Description

Explore at:
zip(101585155 bytes)Available download formats
Dataset updated
Oct 8, 2023
Authors
Kush Tripathi
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Dataset Title: Exploring Mars: A Comprehensive Dataset of Rover Photos and Metadata Description

This dataset provides an extensive collection of Mars rover images paired with in-depth metadata. Sourced from various Mars missions, this dataset is a treasure trove for anyone interested in space exploration, planetary science, or computer vision.

Components:

  • Photos: A curated set of high-definition images taken by different cameras onboard Mars rovers. These images capture a variety of terrains, weather conditions, and other Martian phenomena.
  • Details: A detailed CSV file accompanies these images, containing rich metadata like the type of camera used, the corresponding Martian sol, Earth date, and the rover responsible for each image.

Dataset Origin

The dataset was compiled from various Mars missions conducted over the years. Special care has been taken to include a diverse set of images to enable a wide range of analyses and applications. Objective

As a learner delving into the field of Computer Vision, my objectives for this project are multi-fold:

  • Data Analysis: To perform exploratory data analysis (EDA) to understand the distribution of images based on attributes like camera type, date, and rover.
  • Color Analysis: To identify and visualize dominant colors across different sets of images. This could provide insights into Martian geology.
  • Texture and Pattern Recognition: To classify Martian terrains using texture and pattern recognition techniques.
  • Machine Learning: To potentially develop a predictive model that could classify images into predefined categories based on their features.

Research Questions

  1. Which camera types have contributed the most to the dataset?
  2. What can the dominant colors in the images tell us about Mars?
  3. Can we classify Martian terrains into categories like rocky, sandy, and icy?
  4. Is there a correlation between the type of terrain and other variables like camera type or date?

Tools and Technologies

I plan to utilize Python for this project, particularly libraries like OpenCV for image processing, Pandas for data manipulation, and Matplotlib/Seaborn for data visualization. For machine learning tasks, I will likely use scikit-learn or TensorFlow.

Learning and Development

This project serves as both a learning exercise and a stepping stone toward more complex computer vision projects. I aim to document my learning journey, challenges, and milestones in a series of Kaggle notebooks. Collaboration and Feedback

I warmly invite the Kaggle community to offer suggestions, critiques, or even collaborate on this venture. Your insights could be invaluable in enhancing the depth and breadth of this project.

Search
Clear search
Close search
Google apps
Main menu