26 datasets found
  1. SVM Classification

    • kaggle.com
    Updated Jun 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chinthakindi vinod (2019). SVM Classification [Dataset]. https://www.kaggle.com/vinod00725/svm-classification/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 28, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    chinthakindi vinod
    Description

    Dataset

    This dataset was created by chinthakindi vinod

    Contents

  2. Predict the classification group

    • kaggle.com
    zip
    Updated Jul 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jahanvee Narang (2021). Predict the classification group [Dataset]. https://www.kaggle.com/jahnveenarang/predict-the-classification-group
    Explore at:
    zip(91443 bytes)Available download formats
    Dataset updated
    Jul 4, 2021
    Authors
    Jahanvee Narang
    Description

    Dataset

    This dataset was created by Jahanvee Narang

    Contents

  3. Ad Click Prediction - Classification Problem

    • kaggle.com
    Updated Jul 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jahanvee Narang (2021). Ad Click Prediction - Classification Problem [Dataset]. https://www.kaggle.com/datasets/jahnveenarang/cvdcvd-vd/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jahanvee Narang
    Description

    **New to machine learning and data science? No question is too basic or too simple. Use this place to post any first-timer clarifying questions for the classification algorithm or related to datasets ** !This file contains demographics about customer and whether that customer clicked the ad or not . You this file to use classification algorithm to predict on the basis of demographics of customer as independent variable

    This data set contains the following features:

    This data set contains the following features:

    1. 'User ID': unique identification for consumer
    2. 'Age': cutomer age in years
    3. 'Estimated Salary': Avg. Income of consumer
    4. 'Gender': Whether consumer was male or female
    5. 'Purchased': 0 or 1 indicated clicking on Ad
  4. text classifier svm

    • kaggle.com
    Updated Sep 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kushal Dev (2021). text classifier svm [Dataset]. https://www.kaggle.com/datasets/kushaldev75/text-classifier-svm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kushal Dev
    Description

    Dataset

    This dataset was created by Kushal Dev

    Contents

  5. o

    Fake News Detection

    • opendatabay.com
    • kaggle.com
    .csv
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Fake News Detection [Dataset]. https://www.opendatabay.com/data/dataset/5a25f611-a90e-42d1-b4d8-d2ca35bd8d19
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 8, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Knowledge Bundles
    Description

    🇬🇧 English:

    This synthetic dataset is designed for practicing fake news detection using natural language processing (NLP) techniques. It contains 1000 news samples labeled as "real" or "fake", including fabricated headlines and articles that mimic real-world patterns.

    You can use this dataset to:

    Train NLP classification models like Logistic Regression, SVM, BERT Perform feature engineering on textual data Practice binary classification problems in news analytics Columns:

    title: News headline text: Main body of the news label: Label indicating whether the news is fake or real 🇹🇷 Türkçe:

    Bu sentetik veri seti, doğal dil işleme (NLP) teknikleri kullanarak sahte haber tespiti pratiği yapmak isteyen araştırmacılar ve öğrenciler için tasarlanmıştır. 1000 örnek haber içermektedir ve her biri "real" (gerçek) veya "fake" (sahte) olarak etiketlenmiştir. Haber başlıkları ve içerikleri gerçek dünyayı taklit edecek şekilde oluşturulmuştur.

    Bu veri seti sayesinde:

    Logistic Regression, SVM, BERT gibi NLP modelleri eğitilebilir Metin üzerinde öznitelik mühendisliği yapılabilir Sahte haber tespiti üzerine sınıflandırma çalışmaları yürütülebilir Değişkenler:

    title: Haber başlığı text: Haber içeriği label: Etiket (fake/real)

    Original Data Source: Fake News Detection

  6. Zoo animal classification

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karthikeyan Raghav (2021). Zoo animal classification [Dataset]. https://www.kaggle.com/karthikeyanraghav/zoo-animal-classification
    Explore at:
    zip(1198 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    Karthikeyan Raghav
    Description

    Dataset

    This dataset was created by Karthikeyan Raghav

    Contents

  7. o

    Spam Mail Classifier Dataset

    • opendatabay.com
    .csv
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Spam Mail Classifier Dataset [Dataset]. https://www.opendatabay.com/data/dataset/9aa9a17e-1fe7-44f5-9fb0-f901c05b4a17
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Fraud Detection & Risk Management
    Description

    🇬🇧 English:

    This dataset contains 1,000 synthetic email messages labeled as either spam or ham. It was created to help users build and evaluate text classification models using basic natural language processing (NLP) techniques.

    Use this dataset to:

    Train a spam filter using Naive Bayes, SVM, or Logistic Regression Practice text cleaning, tokenization, and TF-IDF vectorization Build email classification models without needing real personal email data 🇹🇷 Türkçe:

    Bu veri seti, 1.000 adet sentetik e-posta mesajı içerir ve her bir mesaj spam ya da ham (normal) olarak etiketlenmiştir. Doğal dil işleme teknikleriyle spam tespiti modeli geliştirmek isteyenler için hazırlanmıştır.

    Bu veri seti ile:

    Naive Bayes, SVM gibi metin sınıflandırma modelleri geliştirilebilir Metin temizleme, tokenizasyon ve TF-IDF uygulamaları yapılabilir Gerçek e-postalara gerek kalmadan NLP pratiği yapılabilir

    Original Data Source: Spam Mail Classifier Dataset

  8. Data from: Iris Flower Classification

    • kaggle.com
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PavaniGardas (2023). Iris Flower Classification [Dataset]. https://www.kaggle.com/datasets/pavanigardas/iris-flower-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PavaniGardas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Iris Flower Classification is a classic machine learning task used for learning and practicing classification algorithms. The dataset contains features like sepal length, sepal width, petal length, and petal width for three different species of iris flowers. This project involves data pre-processing, model selection, and evaluation. Here, we use classification algorithms like logistic regression, decision trees, k-nearest neighbors (KNN), or support vector machines (SVM) for this classification task.

  9. American Sign Language Digit Dataset

    • kaggle.com
    Updated Aug 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S M Rayeed (2021). American Sign Language Digit Dataset [Dataset]. https://www.kaggle.com/rayeed045/american-sign-language-digit-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    S M Rayeed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    This is a American Sign Language Digits Dataset, from sign 0 to sign 9. This dataset uses depth information for generating hand key-points (using MediaPipe), which enriches the dataset and enhances the accuracy during classification.

    Content

    This is a American Sign Language Digits Dataset, using MediaPipe framework, which accurately detects the hand & 21 hand key-points from a raw RGB image, and stores the co-ordinate values of these key-points. The dataset contains 5000 such raw image files from sign 0 to sign 9 (500 files of each sign) and 5000 corresponding output image files (applying MediaPipe). After generating the dataset, we have also done the classification, using different classifiers, such as KNN, SVM, RFC, DTC, Neural Networks etc. Accuracies for different classifiers are yielded in the classification code (in code section).

    Acknowledgements

    A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures - A.L.C. Barczak, N.H. Reyes, M. Abastillas, A. Piccio and T. Susnjak

  10. A

    ‘Dementia Prediction Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Dementia Prediction Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-dementia-prediction-dataset-8ab0/3d5e8806/?iid=009-768&v=presentation
    Explore at:
    Dataset updated
    Aug 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Dementia Prediction Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/dementia-prediction-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Dementia is a syndrome – usually of a chronic or progressive nature – in which there is deterioration in cognitive function (i.e. the ability to process thought) beyond what might be expected from normal aging. It affects memory, thinking, orientation, comprehension, calculation, learning capacity, language, and judgment. Consciousness is not affected. The impairment in cognitive function is commonly accompanied and occasionally preceded, by deterioration in emotional control, social behaviou, or motivation.

    Dementia results from a variety of diseases and injuries that primarily or secondarily affect the brain, such as Alzheimer's disease or stroke.

    Dementia is one of the major causes of disability and dependency among older people worldwide. It can be overwhelming, not only for the people who have it, but also for their carers and families. There is often a lack of awareness and understanding of dementia, resulting in stigmatization and barriers to diagnosis and care. The impact of dementia on carers, family, and society at large can be physical, psychological, social and e and economic

    Content

    This set consists of a longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions. For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included. The subjects are all right-handed and include both men and women. 72 of the subjects were characterized as nondemented throughout the study. 64 of the included subjects were characterized as demented at the time of their initial visits and remained so for subsequent scans, including 51 individuals with mild to moderate Alzheimer’s disease. Another 14 subjects were characterized as nondemented at the time of their initial visit and were subsequently characterized as demented at a later visit

    Acknowledgements

    Battineni, Gopi; Amenta, Francesco; Chintalapudi, Nalini (2019), “Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF DEMENTIA BY SUPPORT VECTOR MACHINES (SVM)”, Mendeley Data, V1, doi: 10.17632/tsy6rbc5d4.1 * Dataset is available here.

    --- Original source retains full ownership of the source dataset ---

  11. 💸 💳 Online Banking / Financial Review Dataset

    • kaggle.com
    Updated Dec 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Maksi (2022). 💸 💳 Online Banking / Financial Review Dataset [Dataset]. https://www.kaggle.com/datasets/yanmaksi/reviews-data-for-classification-model/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yan Maksi
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This Dataset contains complete data on customer recalls for different banking companies, the data is not clean so before using it you will need to do exploratory data analysis for more complex models. If you are using simpler models you can simply take the column with the stars and the feedback. (You can see my example code with this dataset). Good luck @💯 !!!

  12. BRAIN MRI 2021

    • kaggle.com
    Updated Oct 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sankar (2021). BRAIN MRI 2021 [Dataset]. https://www.kaggle.com/datasets/rajalab/brain-mri-2021/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 14, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sankar
    Description

    Dataset

    This dataset was created by Sankar

    Contents

  13. South African Powerball Results (Lottery)

    • kaggle.com
    Updated May 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teboho (2018). South African Powerball Results (Lottery) [Dataset]. https://www.kaggle.com/datasets/mosemet/south-african-powerball-results-lottery/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Teboho
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    South Africa
    Description

    Context

    This is the South African Lottery results from year 2000 when it started to 2015. I was interested in predicting whether there will be winners or not given the following publicly available information prior to betting:

    1. Prize Payable
    2. Rollover
    3. Rollover Count
    4. Next Estimated Jackpot

    The above mentioned features attract quite a lot of consumers and with an increase in the betters increase the chances of winning.

    This classifier is able to achieve 98% score and correctly predict against the X_test set on whether there will be a division 1 jackpot winner or not. Winner is 1 and no-winner is 0.

    The reason its 98% prediction is only because if there are 2 winners on division 1, it cannot predict this and hence if compared to the test set, it's not wholly accurate.

    Content

    The data was acquired from the National Lottery website. Please look at: https://www.nationallottery.co.za/lotto-history/?game=Lotto for further information

    Acknowledgements

    I am only new to machine learning, being a Chemical Engineer by vocation, I came across this sphere of knowledge and I must admit, most of my nights are spent just coding away and trying to predict the most ludicrous datasets I can dream up. However, its all been a lot of fun, and with every exercise I tend to learn a lot more.

    Inspiration

    One of my challenges is in visualising this data. I tried meshgrid and contourf plots, but getting errors. Also is it possible to to predict the number of division 1 winners? In the y_train data, there are a number of instances where there was more than 1 division 1 winners. However, the SVM was made only to be able to predict 0 for no winners or 1 for winners.

  14. Data from: Pumpkin Seeds Dataset

    • kaggle.com
    Updated Apr 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murat KOKLU (2022). Pumpkin Seeds Dataset [Dataset]. https://www.kaggle.com/datasets/muratkokludataset/pumpkin-seeds-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Murat KOKLU
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DATASET: https://www.muratkoklu.com/datasets/

    Citation Request : KOKLU, M., SARIGIL, S., & OZBEK, O. (2021). The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.). Genetic Resources and Crop Evolution, 68(7), 2713-2726. Doi: https://doi.org/10.1007/s10722-021-01226-0

    https://link.springer.com/article/10.1007/s10722-021-01226-0 https://link.springer.com/content/pdf/10.1007/s10722-021-01226-0.pdf

    DATASET: https://www.muratkoklu.com/datasets/

    Abstract: Pumpkin seeds are frequently consumed as confection worldwide because of their adequate amount of protein, fat, carbohydrate, and mineral contents. This study was carried out on the two most important and quality types of pumpkin seeds, ‘‘Urgup_Sivrisi’’ and ‘‘Cercevelik’’, generally grown in Urgup and Karacaoren regions in Turkey. However, morphological measurements of 2500 pumpkin seeds of both varieties were made possible by using the gray and binary forms of threshold techniques. Considering morphological features, all the data were modeled with five different machine learning methods: Logistic Regression (LR), Multilayer Perceptrons (MLP), Support Vector Machine (SVM) and Random Forest (RF), and k-Nearest Neighbor (k-NN), which further determined the most successful method for classifying pumpkin seed varieties. However, the performances of the models were determined with the help of the 10 kfold cross-validation method. The accuracy rates of the classifiers were obtained as LR 87.92 percent, MLP 88.52 percent, SVM 88.64 percent, RF 87.56 percent, and k-NN 87.64 percent.

    Keywords Pumpkin seed Logistic regression, Multilayer peceptrons, Random forest, Classification, Support vector machine, Thresholding

    DATASET: https://www.muratkoklu.com/datasets/

  15. Banknote Authentication

    • kaggle.com
    Updated Feb 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MasterShomya (2025). Banknote Authentication [Dataset]. https://www.kaggle.com/datasets/mastershomya/banknote-authetication/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MasterShomya
    Description

    This dataset helps in identifying counterfeit banknotes based on statistical features extracted from genuine and forged currency notes. It contains attributes such as variance, skewness, and entropy, which are derived from images of banknotes using wavelet transformation.

    Dataset Details:

    • Task: Classify banknotes as real or fake
    • Features:
    • Variance of Wavelet Transformed Image
    • Skewness of Wavelet Transformed Image
    • Curtosis of Wavelet Transformed Image
    • Entropy of the Image
    • Target: Binary classification (0 = Fake, 1 = Real)
    • Source: UCI Machine Learning Repository

    This dataset is widely used for classification tasks and ML model evaluation in fraud detection.

  16. seed_dataset

    • kaggle.com
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hari narayanan R (2023). seed_dataset [Dataset]. https://www.kaggle.com/datasets/harinarayanan22/seed-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hari narayanan R
    Description

    dataset : Pumpkin seeds are frequently consumed as confection worldwide because of their adequate amount of protein, fat, carbohydrate, and mineral contents. This study was carried out on the two most important and quality types of pumpkin seeds, ‘‘Urgup_Sivrisi’’ and ‘‘Cercevelik’’, generally grown in Urgup and Karacaoren regions in Turkey. However, morphological measurements of 2500 pumpkin seeds of both varieties were made possible by using the gray and binary forms of threshold techniques. Considering morphological features, all the data were modeled with five different machine learning methods: Logistic Regression (LR), Multilayer Perceptrons (MLP), Support Vector Machine (SVM) and Random Forest (RF), and k-Nearest Neighbor (k-NN), which further determined the most successful method for classifying pumpkin seed varieties. However, the performances of the models were determined with the help of the 10 kfold cross-validation method. The accuracy rates of the classifiers were obtained as LR 87.92 percent, MLP 88.52 percent, SVM 88.64 percent, RF 87.56 percent, and k-NN 87.64 percent.

    Keywords Pumpkin seed Logistic regression, Multilayer peceptrons, Random forest, Classification, Support vector machine, Thresholding

  17. Bangla Sign Language Dataset

    • kaggle.com
    Updated Aug 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S M Rayeed (2021). Bangla Sign Language Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/2508666
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    S M Rayeed
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Bangla Sign Language Dataset using Depth Information

    This is a Bangla Sign Language Dataset, using MediaPipe framework, which accurately detects the hand & 21 hand key-points from a raw RGB image, and stores the co-ordinate values of these key-points. After collecting 47000 such raw image files for 47 signs (100 files per sign per user) and generating 47000 corresponding output image files applying MediaPipe, the co-ordinate values of these key-points are stored in a .csv files. This dataset contains 470 such .csv files (collected from 10 users for 47 signs in total). After generating the dataset, we have also done the classification, using different classifiers, such as KNN, SVM, RFC, DTC, Neural Networks etc. Accuracies for different classifiers are yielded in the classification code (in code section).

  18. Data from: Chestnut Varieties Dataset

    • kaggle.com
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mustafa Yurdakul (2025). Chestnut Varieties Dataset [Dataset]. https://www.kaggle.com/datasets/mahyeks/chestnut-varieties-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mustafa Yurdakul
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📄 Description: This dataset consists of 1,156 images of four major chestnut (Castanea sativa) varieties cultivated in Turkey: Alandız, Aydın, Simav, and Zonguldak. Images were captured under controlled lighting conditions using a Samsung NX300 camera, from both front and back angles to ensure diversity. Each folder in the dataset corresponds to a specific chestnut variety.

    The dataset has been used in multiple academic studies and is suitable for developing and testing image classification algorithms, deep learning models, and computer vision systems in agriculture and food technology.

    📚 Citation Request: If you use this dataset in your research or application, cite the following studies:

    Yurdakul, M., Uyar, K., & Taşdemir, Ş. Webserver-Based Mobile Application for Multi-class Chestnut (Castanea sativa) Classification Using Deep Features and Attention Mechanisms, Applied Fruit Science, 2025, 67:102. Springer DOI: https://doi.org/10.1007/s10341-025-01327-5

    Yurdakul, M., Atabaş, İ., & Taşdemir, Ş. (2024, March). Chestnut (Castanea Sativa) Varieties Classification with Harris Hawks Optimization based Selected Features and SVM. In 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS) (pp. 1-5). IEEE.

    🧾 Folder Structure: 📁 alandız – 272 images

    📁 aydın – 228 images

    📁 simav – 304 images

    📁 zonguldak – 352 images

    All images are in .jpg format and represent single chestnuts from different angles.

    🧠 Potential Use Cases: Image classification

    Machine learning & deep learning model development

    Feature selection and optimization benchmarking

    Agricultural and food product recognition

  19. Classification: Persistent vs Non-Persistent

    • kaggle.com
    Updated May 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harbhajan Singh (2021). Classification: Persistent vs Non-Persistent [Dataset]. https://www.kaggle.com/harbhajansingh21/persistent-vs-nonpersistent/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Harbhajan Singh
    License

    http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html

    Description

    This dataset provides valuable insights into the persistency of drug prescriptions in the pharmaceutical industry. By analyzing various factors, we aim to build a classification model to understand the factors influencing persistency. The dataset includes patient information, provider attributes, clinical factors, and disease/treatment factors. The challenge is to uncover patterns and relationships that impact persistency. This analysis will aid pharmaceutical companies in optimizing their strategies and improving patient outcomes. Join me in exploring this dataset and leveraging machine learning techniques to tackle this important problem. Let's dive in and unlock the secrets of drug persistency!

    Problem Statement

    One of the challenge for all Pharmaceutical companies is to understand the persistency of drug as per the physician prescription.

    With an objective to gather insights on the factors that are impacting the persistency, build your own classification model.

    Variable Description

    Here I'm describing the columns in detail:

    Variable: Variable Description Patient ID: Unique ID of each patient Persistency_Flag: Flag indicating if a patient was persistent or not Age: Age of the patient during their therapy Race: Race of the patient from the patient table Region: Region of the patient from the patient table Ethnicity: Ethnicity of the patient from the patient table Gender: Gender of the patient from the patient table IDN Indicator: Flag indicating patients mapped to IDN

    Provider Attributes

    NTM - Physician Specialty: Specialty of the HCP that prescribed the NTM Rx

    Clinical Factors

    NTM - T-Score: T Score of the patient at the time of the NTM Rx (within 2 years prior from rxdate) Change in T Score: Change in Tscore before starting with any therapy and after receiving therapy (Worsened, Remained Same, Improved, Unknown) NTM - Risk Segment: Risk Segment of the patient at the time of the NTM Rx (within 2 years days prior from rxdate) Change in Risk Segment: Change in Risk Segment before starting with any therapy and after receiving therapy (Worsened, Remained Same, Improved, Unknown) NTM - Multiple Risk Factors: Flag indicating if patient falls under multiple risk category (having more than 1 risk) at the time of the NTM Rx (within 365 days prior from rxdate) NTM - Dexa Scan Frequency: Number of DEXA scans taken prior to the first NTM Rx date (within 365 days prior from rxdate) NTM - Dexa Scan Recency: Flag indicating the presence of Dexa Scan before the NTM Rx (within 2 years prior from rxdate or between their first Rx and Switched Rx; whichever is smaller and applicable) Dexa During Therapy: Flag indicating if the patient had a Dexa Scan during their first continuous therapy NTM - Fragility Fracture Recency: Flag indicating if the patient had a recent fragility fracture (within 365 days prior from rxdate) Fragility Fracture During Therapy: Flag indicating if the patient had fragility fracture during their first continuous therapy NTM - Glucocorticoid Recency: Flag indicating usage of Glucocorticoids (>=7.5mg strength) in the one year look-back from the first NTM Rx Glucocorticoid During Therapy: Flag indicating if the patient had a Glucocorticoid usage during the first continuous therapy

    Disease/Treatment Factor

    NTM - Injectable Experience: Flag indicating any injectable drug usage in the recent 12 months before the NTM OP Rx NTM - Risk Factors: Risk Factors that the patient is falling into. For chronic Risk Factors complete lookback to be applied and for non-chronic Risk Factors, one year lookback from the date of first OP Rx NTM - Comorbidity: Comorbidities are divided into two main categories - Acute and chronic, based on the ICD codes. For chronic disease we are taking complete look back from the first Rx date of NTM therapy and for acute diseases, time period before the NTM OP Rx with one year lookback has been applied NTM - Concomitancy: Concomitant drugs recorded prior to starting with a therapy(within 365 days prior from first rxdate) Adherence: Adherence for the therapies

    Inspiration

    This is my first datasets in the Kaggle. Hope you will learn and make more notebooks from this datasets. If you learn something from this datasets then don't forget to upvote it.

  20. Bank Credit Approval Dataset

    • kaggle.com
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Şahide ŞEKER (2025). Bank Credit Approval Dataset [Dataset]. https://www.kaggle.com/datasets/sahideseker/bank-credit-approval-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Şahide ŞEKER
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🇬🇧 English:

    This synthetic dataset was created to simulate a typical bank credit approval process. It includes 1,000 applicant records with relevant financial and demographic details such as age, income, credit score, employment status, and requested loan amount. A final approved column indicates whether the credit application was accepted.

    Use this dataset to:

    • Train and evaluate classification models such as Logistic Regression, SVM, XGBoost
    • Explore the impact of income, credit score, and employment status on approval decisions
    • Practice real-world financial modeling without accessing private data

    🇹🇷 Türkçe:

    Bu sentetik veri seti, bir bankanın kredi başvuru sürecini modellemek amacıyla oluşturulmuştur. 1.000 başvuru sahibine ait yaş, gelir, kredi puanı, istihdam durumu ve talep edilen kredi tutarı gibi bilgiler yer almaktadır. approved sütunu ise başvurunun onaylanıp onaylanmadığını belirtir.

    Bu veri seti sayesinde:

    • Logistic Regression, SVM, XGBoost gibi sınıflandırma modelleri eğitilebilir
    • Onay kararlarını etkileyen faktörler analiz edilebilir
    • Finansal modelleme becerileri geliştirilebilir
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
chinthakindi vinod (2019). SVM Classification [Dataset]. https://www.kaggle.com/vinod00725/svm-classification/activity
Organization logo

SVM Classification

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
chinthakindi vinod
Description

Dataset

This dataset was created by chinthakindi vinod

Contents

Search
Clear search
Close search
Google apps
Main menu