100+ datasets found
  1. Social Media and Mental Health

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
    Explore at:
    zip(10944 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    SouvikAhmed071
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

    The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

    This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

    The following is the Google Colab link to the project, done on Jupyter Notebook -

    https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

    The following is the GitHub Repository of the project -

    https://github.com/daerkns/social-media-and-mental-health

    Libraries used for the Project -

    Pandas
    Numpy
    Matplotlib
    Seaborn
    Sci-kit Learn
    
  2. COCO2017 Image Caption Train

    • kaggle.com
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seungjun Lee (2024). COCO2017 Image Caption Train [Dataset]. https://www.kaggle.com/datasets/seungjunleeofficial/coco2017-image-caption-train
    Explore at:
    zip(19236355851 bytes)Available download formats
    Dataset updated
    May 30, 2024
    Authors
    Seungjun Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains only the COCO 2017 train images (118K images) and a caption annotation JSON file, designed to fit within Google Colab's available disk space of approximately 50GB when connected to a GPU runtime.

    If you're using PyTorch on Google Colab, you can easily utilize this dataset as follows:

    Manually downloading and uploading the file to Colab can be time-consuming. Therefore, it's more efficient to download this data directly into Google Colab. Please ensure you have first added your Kaggle key to Google Colab. You can find more details on this process here

    from google.colab import drive
    import os
    import torch
    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    
    os.environ["KAGGLE_KEY"] = userdata.get('KAGGLE_KEY')
    os.environ["KAGGLE_USERNAME"] = userdata.get('KAGGLE_USERNAME')
    
    # Download the Dataset and unzip it
    !kaggle datasets download -d seungjunleeofficial/coco2017-image-caption-train
    !mkdir "/content/Dataset"
    !unzip "coco2017-image-caption-train" -d "/content/Dataset"
    
    
    # load the dataset
    cap = dset.CocoCaptions(root = '/content/Dataset/COCO2017 Image Captioning Train/train2017',
                annFile = '/content/Dataset/COCO2017 Image Captioning Train/captions_train2017.json',
                transform=transforms.PILToTensor())
    

    You can then use the dataset in the following way:

    print(f"Number of samples: {len(cap)}")
    img, target = cap[3]
    print(img.shape)
    print(target)
    # Output example: torch.Size([3, 425, 640])
    # ['A zebra grazing on lush green grass in a field.', 'Zebra reaching its head down to ground where grass is.', 
    # 'The zebra is eating grass in the sun.', 'A lone zebra grazing in some green grass.', 
    # 'A Zebra grazing on grass in a green open field.']
    
  3. h

    hagrid-sample-120k-384p

    • huggingface.co
    Updated Jul 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Mills (2023). hagrid-sample-120k-384p [Dataset]. https://huggingface.co/datasets/cj-mills/hagrid-sample-120k-384p
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2023
    Authors
    Christian Mills
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 127,331 images from HaGRID (HAnd Gesture Recognition Image Dataset) downscaled to 384p. The original dataset is 716GB and contains 552,992 1080p images. I created this sample for a tutorial so readers can use the dataset in the free tiers of Google Colab and Kaggle Notebooks.

      Original Authors:
    

    Alexander Kapitanov Andrey Makhlyarchuk Karina Kvanchiani

      Original Dataset Links
    

    GitHub Kaggle Datasets Page

      Object Classes
    

    ['call'… See the full description on the dataset page: https://huggingface.co/datasets/cj-mills/hagrid-sample-120k-384p.

  4. Brain Tumor Classification

    • kaggle.com
    zip
    Updated Nov 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taneem UR Rehman (2022). Brain Tumor Classification [Dataset]. https://www.kaggle.com/datasets/taneemurrehman/brain-tumor-classification
    Explore at:
    zip(91002358 bytes)Available download formats
    Dataset updated
    Nov 26, 2022
    Authors
    Taneem UR Rehman
    Description

    Please follow the steps below to download and use Kaggle data within Google Colab:

    1) from google.colab import files files.upload()

    Choose the kaggle.json file that you downloaded 2) ! mkdir ~/.kaggle

    ! cp kaggle.json ~/.kaggle/

    Make directory named kaggle and copy kaggle.json file there. 4) ! chmod 600 ~/.kaggle/kaggle.json

    Change the permissions of the file. 5) ! kaggle datasets list - That's all ! You can check if everything's okay by running this command.

    Use unzip command to unzip the data:

    unzip train data there,

    ! unzip train.zip -d train

  5. h

    part1_dataSorted_Diversevul_llama2_dataset

    • huggingface.co
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Prashant Pawar (2024). part1_dataSorted_Diversevul_llama2_dataset [Dataset]. https://huggingface.co/datasets/atharvapawar/part1_dataSorted_Diversevul_llama2_dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 19, 2024
    Authors
    Atharva Prashant Pawar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset : part1_dataSorted_Diversevul_llama2_dataset

      dataset lines : 2768
    
    
    
    
    
      Kaggle Notebook (for dataset splitting) : https://www.kaggle.com/code/mrappplg/securix-diversevul-dataset
    
    
    
    
    
      Google Colab Notebook : https://colab.research.google.com/drive/1z6fLQrcMSe1-AVMHp0dp6uDr4RtVIOzF?usp=sharing
    
  6. Google_Colab

    • kaggle.com
    zip
    Updated Jul 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiran Kolte (2020). Google_Colab [Dataset]. https://www.kaggle.com/datasets/kskolte2020/google-colab
    Explore at:
    zip(8535 bytes)Available download formats
    Dataset updated
    Jul 26, 2020
    Authors
    Kiran Kolte
    Description

    Dataset

    This dataset was created by Kiran Kolte

    Contents

  7. h

    hagrid-classification-512p-no-gesture-150k-zip

    • huggingface.co
    Updated May 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Mills (2023). hagrid-classification-512p-no-gesture-150k-zip [Dataset]. https://huggingface.co/datasets/cj-mills/hagrid-classification-512p-no-gesture-150k-zip
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2023
    Authors
    Christian Mills
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 153,735 training images from HaGRID (HAnd Gesture Recognition Image Dataset) modified for image classification instead of object detection. The original dataset is 716GB. I created this sample for a tutorial so readers can use the dataset in the free tiers of Google Colab and Kaggle Notebooks.

      Original Authors:
    

    Alexander Kapitanov Andrey Makhlyarchuk Karina Kvanchiani

      Original Dataset Links
    

    GitHub Kaggle Datasets Page

  8. h

    finalproject_spotify

    • huggingface.co
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ulee berber (2025). finalproject_spotify [Dataset]. https://huggingface.co/datasets/uleeberber/finalproject_spotify
    Explore at:
    Dataset updated
    Nov 19, 2025
    Authors
    ulee berber
    Description

    Readme link to video presentation: https://youtu.be/Ybz20H5reBI link to collab: https://colab.research.google.com/drive/1zDY3D8hn8id8kgqX2QR5tmk22LnOccfc?usp=sharing link to kaggle data set: https://www.kaggle.com/datasets/joebeachcapital/30000-spotify-songs/data?select=spotify_songs.csv Dataset Description: This dataset is brought from kaggle: "30000 Spotify Songs". The dataset contains both numeric and categorical variables describing songs available on Spotify. It includes musical… See the full description on the dataset page: https://huggingface.co/datasets/uleeberber/finalproject_spotify.

  9. R

    Accident Detection Model Dataset

    • universe.roboflow.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Accident detection model (2024). Accident Detection Model Dataset [Dataset]. https://universe.roboflow.com/accident-detection-model/accident-detection-model/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 8, 2024
    Dataset authored and provided by
    Accident detection model
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Accident Bounding Boxes
    Description

    Accident-Detection-Model

    Accident Detection Model is made using YOLOv8, Google Collab, Python, Roboflow, Deep Learning, OpenCV, Machine Learning, Artificial Intelligence. It can detect an accident on any accident by live camera, image or video provided. This model is trained on a dataset of 3200+ images, These images were annotated on roboflow.

    Problem Statement

    • Road accidents are a major problem in India, with thousands of people losing their lives and many more suffering serious injuries every year.
    • According to the Ministry of Road Transport and Highways, India witnessed around 4.5 lakh road accidents in 2019, which resulted in the deaths of more than 1.5 lakh people.
    • The age range that is most severely hit by road accidents is 18 to 45 years old, which accounts for almost 67 percent of all accidental deaths.

    Accidents survey

    https://user-images.githubusercontent.com/78155393/233774342-287492bb-26c1-4acf-bc2c-9462e97a03ca.png" alt="Survey">

    Literature Survey

    • Sreyan Ghosh in Mar-2019, The goal is to develop a system using deep learning convolutional neural network that has been trained to identify video frames as accident or non-accident.
    • Deeksha Gour Sep-2019, uses computer vision technology, neural networks, deep learning, and various approaches and algorithms to detect objects.

    Research Gap

    • Lack of real-world data - We trained model for more then 3200 images.
    • Large interpretability time and space needed - Using google collab to reduce interpretability time and space required.
    • Outdated Versions of previous works - We aer using Latest version of Yolo v8.

    Proposed methodology

    • We are using Yolov8 to train our custom dataset which has been 3200+ images, collected from different platforms.
    • This model after training with 25 iterations and is ready to detect an accident with a significant probability.

    Model Set-up

    Preparing Custom dataset

    • We have collected 1200+ images from different sources like YouTube, Google images, Kaggle.com etc.
    • Then we annotated all of them individually on a tool called roboflow.
    • During Annotation we marked the images with no accident as NULL and we drew a box on the site of accident on the images having an accident
    • Then we divided the data set into train, val, test in the ratio of 8:1:1
    • At the final step we downloaded the dataset in yolov8 format.
      #### Using Google Collab
    • We are using google colaboratory to code this model because google collab uses gpu which is faster than local environments.
    • You can use Jupyter notebooks, which let you blend code, text, and visualisations in a single document, to write and run Python code using Google Colab.
    • Users can run individual code cells in Jupyter Notebooks and quickly view the results, which is helpful for experimenting and debugging. Additionally, they enable the development of visualisations that make use of well-known frameworks like Matplotlib, Seaborn, and Plotly.
    • In Google collab, First of all we Changed runtime from TPU to GPU.
    • We cross checked it by running command ‘!nvidia-smi’
      #### Coding
    • First of all, We installed Yolov8 by the command ‘!pip install ultralytics==8.0.20’
    • Further we checked about Yolov8 by the command ‘from ultralytics import YOLO from IPython.display import display, Image’
    • Then we connected and mounted our google drive account by the code ‘from google.colab import drive drive.mount('/content/drive')’
    • Then we ran our main command to run the training process ‘%cd /content/drive/MyDrive/Accident Detection model !yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=1 imgsz=640 plots=True’
    • After the training we ran command to test and validate our model ‘!yolo task=detect mode=val model=runs/detect/train/weights/best.pt data=data.yaml’ ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt conf=0.25 source=data/test/images’
    • Further to get result from any video or image we ran this command ‘!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source="/content/drive/MyDrive/Accident-Detection-model/data/testing1.jpg/mp4"’
    • The results are stored in the runs/detect/predict folder.
      Hence our model is trained, validated and tested to be able to detect accidents on any video or image.

    Challenges I ran into

    I majorly ran into 3 problems while making this model

    • I got difficulty while saving the results in a folder, as yolov8 is latest version so it is still underdevelopment. so i then read some blogs, referred to stackoverflow then i got to know that we need to writ an extra command in new v8 that ''save=true'' This made me save my results in a folder.
    • I was facing problem on cvat website because i was not sure what
  10. Cats&Dogs (Pickle)

    • kaggle.com
    zip
    Updated Feb 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FLuzmano (2020). Cats&Dogs (Pickle) [Dataset]. https://www.kaggle.com/fariziluzman/catsdogs-pickle
    Explore at:
    zip(226313720 bytes)Available download formats
    Dataset updated
    Feb 27, 2020
    Authors
    FLuzmano
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by FLuzmano

    Released under CC0: Public Domain

    Contents

    CNN

    For Google colab practice

  11. R

    Robust Shelf Monitoring Dataset

    • universe.roboflow.com
    zip
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shelf Monitoring (2022). Robust Shelf Monitoring Dataset [Dataset]. https://universe.roboflow.com/shelf-monitoring/robust-shelf-monitoring/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2022
    Dataset authored and provided by
    Shelf Monitoring
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Stock Of Products In Shelf Bounding Boxes
    Description

    Robust Shelf Monitoring

    We aim to build a Robust Shelf Monitoring system to help store keepers to maintain accurate inventory details, to re-stock items efficiently and on-time and to tackle the problem of misplaced items where an item is accidentally placed at a different location. Our product aims to serve as store manager that alerts the owner about items that needs re-stocking and misplaced items.

    Training the model:

    • Unzip the labelled dataset from kaggle and store it to your google drive.
    • Follow the tutorial and update the training parameters in custom-yolov4-detector.cfg file in /darknet/cfg/ directory.
    • filters = (number of classes + 5) * 3 for each yolo layer.
    • max_batches = (number of classes) * 2000

    Steps to run the prediction colab notebook:

    1. Install the required dependencies; pymongo,dnspython.
    2. Clone the darknet repository and the required python scripts.
    3. Mount the google drive containing the weight file.
    4. Copy the pre-trained weight file to the yolo content directory.
    5. Run the detect.py script to peform the prediction. ## Presenting the predicted result. The detect.py script have option to send SMS notification to the shop keepers. We have built a front-end for building the phone-book for collecting the details of the shopkeepers. It also displays the latest prediction result and model accuracy.
  12. Sample Posts from the ADHD dataset.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam (2025). Sample Posts from the ADHD dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0315829.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ahmed Akib Jawad Karim; Kazi Hafiz Md. Asad; Md. Golam Rabiul Alam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This work focuses on the efficiency of the knowledge distillation approach in generating a lightweight yet powerful BERT-based model for natural language processing (NLP) applications. After the model creation, we applied the resulting model, LastBERT, to a real-world task—classifying severity levels of Attention Deficit Hyperactivity Disorder (ADHD)-related concerns from social media text data. Referring to LastBERT, a customized student BERT model, we significantly lowered model parameters from 110 million BERT base to 29 million-resulting in a model approximately 73.64% smaller. On the General Language Understanding Evaluation (GLUE) benchmark, comprising paraphrase identification, sentiment analysis, and text classification, the student model maintained strong performance across many tasks despite this reduction. The model was also used on a real-world ADHD dataset with an accuracy of 85%, F1 score of 85%, precision of 85%, and recall of 85%. When compared to DistilBERT (66 million parameters) and ClinicalBERT (110 million parameters), LastBERT demonstrated comparable performance, with DistilBERT slightly outperforming it at 87%, and ClinicalBERT achieving 86% across the same metrics. These findings highlight the LastBERT model’s capacity to classify degrees of ADHD severity properly, so it offers a useful tool for mental health professionals to assess and comprehend material produced by users on social networking platforms. The study emphasizes the possibilities of knowledge distillation to produce effective models fit for use in resource-limited conditions, hence advancing NLP and mental health diagnosis. Furthermore underlined by the considerable decrease in model size without appreciable performance loss is the lower computational resources needed for training and deployment, hence facilitating greater applicability. Especially using readily available computational tools like Google Colab and Kaggle Notebooks. This study shows the accessibility and usefulness of advanced NLP methods in pragmatic world applications.

  13. h

    DubaiRealEstateSalesInsights

    • huggingface.co
    Updated Apr 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PELEG ELRAZ (2024). DubaiRealEstateSalesInsights [Dataset]. https://huggingface.co/datasets/pelegelraz/DubaiRealEstateSalesInsights
    Explore at:
    Dataset updated
    Apr 25, 2024
    Authors
    PELEG ELRAZ
    Description

    Dubai Real Estate – Exploratory Data Analysis (EDA)

      Overview
    

    This project presents an Exploratory Data Analysis (EDA) of residential real-estate listings in Dubai.The goal is to identify key factors influencing property prices using statistical exploration, data cleaning, and visual insights. The full analysis was performed in Google Colab.The dataset (dubai_real_estate.csv) is hosted on HuggingFace.

      Dataset
    

    Source: Kaggle – Dubai Real Estate Listings
    File:… See the full description on the dataset page: https://huggingface.co/datasets/pelegelraz/DubaiRealEstateSalesInsights.

  14. HaGRID Sample 500k 384p

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innominate817 (2022). HaGRID Sample 500k 384p [Dataset]. https://www.kaggle.com/datasets/innominate817/hagrid-sample-500k-384p
    Explore at:
    zip(13099488276 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    Innominate817
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains the 509,323 training images from HaGRID (HAnd Gesture Recognition Image Dataset) downscaled to 384p. The original dataset is 716GB and contains 552,992 1080p images. I created this sample for a tutorial so readers can use the dataset in the free tiers of Google Colab and Kaggle Notebooks.

    Original Authors:

    Original Dataset Links

    Kaggle Notebooks

    • Training:
      • The free GPU tier for Kaggle Notebooks takes around 15 minutes per epoch.
      • A screen recording walking through the setup steps is available on Youtube (link). Timestamps are in the video description.
    • Inference

    Initial IceVision YOLOX Tiny Results:

    • 30K 384p: 76.2509% COCOMetric after 20 epochs
    • 60K 384p: 78.4270% COCOMetric after 20 epochs
    • 120K 384p: 81.3696% COCOMetric after 20 epochs
    • 500K 384p: 81.5283% COCOMetric after 10 epochs

    Object Classes

    ['call',
     'no_gesture',
     'dislike',
     'fist',
     'four',
     'like',
     'mute',
     'ok',
     'one',
     'palm',
     'peace',
     'peace_inverted',
     'rock',
     'stop',
     'stop_inverted',
     'three',
     'three2',
     'two_up',
     'two_up_inverted']
    

    Annotations

    • bboxes: [top-left-X-position, top-left-Y-position, width, height]
    • Multiply top-left-X-position and width values by the image width and multiply top-left-Y-position and height values by the image height.
      00005c9c-3548-4a8f-9d0b-2dd4aff37fc9
      bboxes[[0.23925175, 0.28595301, 0.25055143, 0.20777627]]
      labels[call]
      leading_handright
      leading_conf1
      user_id5a389ffe1bed6660a59f4586c7d8fe2770785e5bf79b09334aa951f6f119c024
  15. Mobiles & laptop Sales Data

    • kaggle.com
    zip
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VINOTH KANNA S (2025). Mobiles & laptop Sales Data [Dataset]. https://www.kaggle.com/datasets/vinothkannaece/mobiles-and-laptop-sales-data
    Explore at:
    zip(3242055 bytes)Available download formats
    Dataset updated
    Mar 24, 2025
    Authors
    VINOTH KANNA S
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset simulates sales transactions for mobile phones and laptops, including product specifications, customer details, and sales information. It contains 50,000 rows of randomly generated data to help analyze product sales trends, customer purchasing behavior, and regional distribution of sales.

    Dataset Overview

    • Dataset Type: Structured tabular data
    • Number of Rows: 50,000
    • Number of Columns: 16

    Purpose of the Dataset
    This dataset can be used for:
    ✅ Sales Analysis – Understanding product demand and pricing trends.
    ✅ Customer Behavior Analysis– Identifying buying patterns across locations.
    ✅ Inventory Management – Monitoring inward and dispatched product movements.
    ✅ Machine Learning & AI – Predicting sales trends, customer preferences, and stock management.

    Key Features in the Dataset

    1. Product Information

      • Product: Type of product (Mobile Phone / Laptop).
      • Brand: Various brands like Apple, Samsung, Dell, Lenovo, OnePlus, etc.
      • Product Code: Unique identifier for each product.
      • Product Specification: Brief description of the product features.
    2. Sales & Pricing Details

      • Price: Cost of the product (randomly generated).
      • Inward Date: Date when the product was received in stock.
      • Dispatch Date: Date when the product was sold/dispatched.
      • Quantity Sold: Number of units sold per transaction.
    3. Customer & Location Details

      • Customer Name: Randomly generated customer names.
      • Customer Location: City of the customer.
      • Region: Sales region (North, South, East, West, Central).
    4. Technical Specifications -Core Specification (For Laptops): Includes processor models like i3, i5, i7, i9, Ryzen 3-9.
      -Processor Specification (For Mobiles): Includes processors like Snapdragon, Exynos, Apple A-Series, and MediaTek Dimensity.
      -RAM: Randomly assigned memory sizes (4GB to 32GB).
      -ROM: Storage capacity (64GB to 1TB).
      -SSD (For Laptops): Additional storage (256GB to 2TB), "N/A" for mobile phones.

    Potential Use Cases: Business Intelligence Dashboards Market Trend Analysis Supply Chain Optimization
    Customer Segmentation
    Machine Learning Model Training (Sales Prediction, Price Optimization, etc.)

  16. h

    COREVQA

    • huggingface.co
    • kaggle.com
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COREVQA (2025). COREVQA [Dataset]. https://huggingface.co/datasets/COREVQA2025/COREVQA
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    COREVQA
    Description

    COREVQA: A Crowd Observation and Reasoning Entailment Visual Question Answering Benchmark

    Paper: https://www.arxiv.org/abs/2507.13405 Repository: https://github.com/corevqa/COREVQA Demo: https://colab.research.google.com/drive/1SpuTta5tSzktiCo9xN4CtE9P1pmYV0ax CrowdHuman Dataset Homepage: https://www.crowdhuman.org/

      Abstract
    

    Recently, many benchmarks and datasets have been developed to evaluate Vision-Language Models (VLMs) using visual question answering (VQA)… See the full description on the dataset page: https://huggingface.co/datasets/COREVQA2025/COREVQA.

  17. Batik Nusantara (Batik Indonesia) Dataset

    • kaggle.com
    zip
    Updated Feb 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HendryHB (2024). Batik Nusantara (Batik Indonesia) Dataset [Dataset]. https://www.kaggle.com/datasets/hendryhb/batik-nusantara-batik-indonesia-dataset
    Explore at:
    zip(105554919 bytes)Available download formats
    Dataset updated
    Feb 17, 2024
    Authors
    HendryHB
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Batik Background

    Indonesian textile craftsmanship has evolved over millennia, transitioning from basic utilitarian weaving techniques around 2500 BC to more intricate patterns and religious symbolism and social and culture during the time, with production hubs across regions like Sumatra, Borneo, Java, Celebes, Nusa Tenggara, and Bali. These textiles evolved from utilitarian items to carriers of sacred meanings, divided into secular and sacred cloths, both renowned for their aesthetic beauty. They played a pivotal role in individuals' cultural journeys, symbolizing life stages like maternity, matrimony, and mortality, with designs reflecting religious beliefs and the era's influence. The Batik technique, a hallmark of Indonesian textile artistry, involves creating intricate patterns using a resist wax method. Traditionally, artisans used a tool called a canting to draw patterns on fabric, a process known as batik tulis (drawn batik). Following the drawing phase, the cloth was dyed using natural dyes, and then subjected to the "lorot" process, involving boiling the wax out of the fabric. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19051508%2Fe543b4e91ad5dffe2b54e7f4300cc7b2%2F2024-02-16%2015.09.06%20copy%202.jpg?generation=1708074019154098&alt=media" alt=""> Batik making is revered for its complexity and demands high craftsmanship, requiring precise hand gestures and mastery of the canting tool. It stands as one of the most challenging pattern-making techniques in textile artistry. [1]

    Dataset Collection

    The primary objective of this dataset is to serve as a resource for research or academic or educational purposes rather than commercial endeavors. The dataset was meticulously compiled to include high-quality images representative of various types of Batik, encompassing the rich diversity of Batik Nusantara or Indonesian Batik from the Aceh to Papua regions.

    Andrew has mentioned that the cornerstone of effective machine learning lies in the quality of the data. Meticulously curated datasets hold the power to unlock valuable insights and drive meaningful results. In other words, data is more important than models. In contrast, datasets lacking in quality may hinder the learning process and lead to suboptimal outcomes. Therefore, prioritizing data quality is paramount, as it lays the foundation for successful machine learning initiatives [2]. Also Sebastian added that the effectiveness of a machine learning algorithm greatly depends on the quality of the data and the richness of the information it encapsulates [3].

    Acknowledgments

    This dataset was meticulously carefully collected with the assistance of Ultralytics. The ownership of all images within this dataset belongs to respective parties, to whom we extend our gratitude for their contribution of these visually captivating images.

    To cite from Kaggle:

    [Dataset creator's name]. ([Year & Month of dataset creation]). [Name of the dataset], [Version of the dataset]. Retrieved [Date Retrieved] from [URL of the dataset].

    Dataset

    Comprising 40 raw images per class with image dimension of 224 x 224, this dataset encompasses a wide array of Batik designs, each representing a distinct category. The classes include 'Aceh PintuAceh', 'Bali Barong', 'Bali Merak', 'DKI OndelOndel', 'JawaBarat Megamendung', 'JawaTimur Pring', 'Kalimantan Dayak', 'Lampung Gajah', 'Madura Mataketeran', 'Maluku Pala', 'NTB Lumbung', 'Papua Asmat', 'Papua Cendrawasih', 'Papua Tifa', 'Solo Parang', 'SulawesiSelatan Lontara', 'SumateraBarat Rumah Minang', 'SumateraUtara Boraspati', 'Yogyakarta Kawung', and 'Yogyakarta Parang' [2][3][4][5][6][7]. These classes collectively portray the rich heritage of Batik Nusantara or Batik Indonesia, spanning from the Aceh to Papua regions. Feel free to explore image augmentation techniques to further enhance the dataset.

    Simple Coding is available @ git with assumption using Colab. For reference, the following pre-trained architectures have been added: VGG16, ResNet50, Xception, MobileNetV2, along with Content-Based Image Retrieval (CBIR), Random Forest, a CNN architecture, and modeling, in addition to the MLP. It is also available on Kaggle Dataset Notebooks (Code).

    Instructions for Dataset Usage

    Below are steps to utilise the dataset using either Google Colab or Jupyter Notebook: 1. Begin by downloading the dataset. 2. Upon extraction, you'll find separate folders for training and testing data. Should you require validation data, either manually split a portion (approximately around 20%) from the training set and store it separately, or perform on-the-fly splitting during coding. 3. If splitting validation data manually, remember to re-zip the dataset after the separation process. 4....

  18. NYC Jobs Dataset (Filtered Columns)

    • kaggle.com
    zip
    Updated Oct 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery Mandrake (2022). NYC Jobs Dataset (Filtered Columns) [Dataset]. https://www.kaggle.com/datasets/jefferymandrake/nyc-jobs-filtered-cols
    Explore at:
    zip(93408 bytes)Available download formats
    Dataset updated
    Oct 5, 2022
    Authors
    Jeffery Mandrake
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Use this dataset with Misra's Pandas tutorial: How to use the Pandas GroupBy function | Pandas tutorial

    The original dataset came from this site: https://data.cityofnewyork.us/City-Government/NYC-Jobs/kpav-sd4t/data

    I used Google Colab to filter the columns with the following Pandas commands. Here's a Colab Notebook you can use with the commands listed below: https://colab.research.google.com/drive/17Jpgeytc075CpqDnbQvVMfh9j-f4jM5l?usp=sharing

    Once the csv file is uploaded to Google Colab, use these commands to process the file.

    import pandas as pd # load the file and create a pandas dataframe df = pd.read_csv('/content/NYC_Jobs.csv') # keep only these columns df = df[['Job ID', 'Civil Service Title', 'Agency', 'Posting Type', 'Job Category', 'Salary Range From', 'Salary Range To' ]] # save the csv file without the index column df.to_csv('/content/NYC_Jobs_filtered_cols.csv', index=False)

  19. MyAnimeList Anime Dataset (till June 2025)

    • kaggle.com
    zip
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trushank Vashikar (2025). MyAnimeList Anime Dataset (till June 2025) [Dataset]. https://www.kaggle.com/datasets/trushankvashikar/myanimelist-anime-dataset-till-june-2025
    Explore at:
    zip(71480343 bytes)Available download formats
    Dataset updated
    Jun 19, 2025
    Authors
    Trushank Vashikar
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    MyAnimeList.net – Complete Anime Dataset (2025 Edition)

    This dataset contains a comprehensive collection of anime entries from MyAnimeList.net, updated to reflect the latest titles as of 2025. It is ideal for performing Exploratory Data Analysis (EDA) and building robust anime recommendation systems, including collaborative filtering, content-based methods, and hybrid approaches.

    Files Included:

    1. myanimelist_recommender_ready.csv ``Contains core metadata for each anime such as: - mal_id - title - score - members - genres - type - episodes - synopsis, etc.

    2. anime_reviews.json (not perfect /future update) A separate JSON file containing the top 1–10 user reviews for each anime (based on availability), scraped using the Jikan API and stored through Google Firebase.

    Review Scraping Note: I attempted to scrape and save user reviews for each anime using a custom Python script that: - Used Google Colab for execution - Stored data directly into Firebase Firestore - Collected up to 10 top reviews per anime using the Jikan API

    However, after reaching around 9,000 entries, the Colab runtime disconnected. Although I implemented a resume feature to continue scraping from a specific ID, a logic bug introduced incorrect mapping of reviews to anime IDs in Firebase, resulting in misplaced review records.

    This is being fixed, and a properly cleaned version of the reviews will be uploaded in a future update.

    ### Use Cases: This dataset is great for: - Anime recommendation systems (content-based, collaborative, hybrid) - Natural Language Processing (NLP) on anime reviews - Clustering anime by genres, type, or user ratings - Sentiment analysis on review text - Visualization of anime trends and metadata

    Credits: Data Source: MyAnimeList.net Scraped via: Jikan REST API Backend: Firebase Firestore Runtime: Google Colab

  20. HaGRID Classification 512p no_gesture 150k

    • kaggle.com
    zip
    Updated Mar 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Innominate817 (2023). HaGRID Classification 512p no_gesture 150k [Dataset]. https://www.kaggle.com/datasets/innominate817/hagrid-classification-512p-no-gesture-150k
    Explore at:
    zip(3808072768 bytes)Available download formats
    Dataset updated
    Mar 9, 2023
    Authors
    Innominate817
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 125,912 training images from HaGRID (HAnd Gesture Recognition Image Dataset) modified for image classification instead of object detection. This version contains a separate folder with 27,823 samples images containing no gestures for a total 153,787 training samples. The original dataset is 716GB. I created this sample for a tutorial so readers can use the dataset in the free tiers of Google Colab and Kaggle Notebooks.

    Training Notebooks:

    Original Authors:

    Original Dataset Links

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
Organization logo

Social Media and Mental Health

Correlation between Social Media use and General Mental Well-being

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(10944 bytes)Available download formats
Dataset updated
Jul 18, 2023
Authors
SouvikAhmed071
License

Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically

Description

This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

The following is the Google Colab link to the project, done on Jupyter Notebook -

https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

The following is the GitHub Repository of the project -

https://github.com/daerkns/social-media-and-mental-health

Libraries used for the Project -

Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Search
Clear search
Close search
Google apps
Main menu