26 datasets found

SVM Classification
kaggle.com
Updated Jun 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chinthakindi vinod (2019). SVM Classification [Dataset]. https://www.kaggle.com/vinod00725/svm-classification/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 28, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
chinthakindi vinod
Description
Dataset

This dataset was created by chinthakindi vinod

Contents
Predict the classification group
kaggle.com
zip
Updated Jul 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jahanvee Narang (2021). Predict the classification group [Dataset]. https://www.kaggle.com/jahnveenarang/predict-the-classification-group
Explore at:
zip(91443 bytes)Available download formats
Dataset updated
Jul 4, 2021
Authors
Jahanvee Narang
Description
Dataset

This dataset was created by Jahanvee Narang

Contents
Ad Click Prediction - Classification Problem
kaggle.com
Updated Jul 4, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jahanvee Narang (2021). Ad Click Prediction - Classification Problem [Dataset]. https://www.kaggle.com/datasets/jahnveenarang/cvdcvd-vd/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jahanvee Narang
Description
**New to machine learning and data science? No question is too basic or too simple. Use this place to post any first-timer clarifying questions for the classification algorithm or related to datasets ** !This file contains demographics about customer and whether that customer clicked the ad or not . You this file to use classification algorithm to predict on the basis of demographics of customer as independent variable

This data set contains the following features:

This data set contains the following features:

'User ID': unique identification for consumer

'Age': cutomer age in years

'Estimated Salary': Avg. Income of consumer

'Gender': Whether consumer was male or female

'Purchased': 0 or 1 indicated clicking on Ad
text classifier svm
kaggle.com
Updated Sep 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kushal Dev (2021). text classifier svm [Dataset]. https://www.kaggle.com/datasets/kushaldev75/text-classifier-svm
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kushal Dev
Description
Dataset

This dataset was created by Kushal Dev

Contents
o
Fake News Detection
opendatabay.com
kaggle.com
.csv
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Fake News Detection [Dataset]. https://www.opendatabay.com/data/dataset/5a25f611-a90e-42d1-b4d8-d2ca35bd8d19
Explore at:
.csvAvailable download formats
Dataset updated
Jun 8, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Knowledge Bundles
Description
🇬🇧 English:

This synthetic dataset is designed for practicing fake news detection using natural language processing (NLP) techniques. It contains 1000 news samples labeled as "real" or "fake", including fabricated headlines and articles that mimic real-world patterns.

You can use this dataset to:

Train NLP classification models like Logistic Regression, SVM, BERT Perform feature engineering on textual data Practice binary classification problems in news analytics Columns:

title: News headline text: Main body of the news label: Label indicating whether the news is fake or real 🇹🇷 Türkçe:

Bu sentetik veri seti, doğal dil işleme (NLP) teknikleri kullanarak sahte haber tespiti pratiği yapmak isteyen araştırmacılar ve öğrenciler için tasarlanmıştır. 1000 örnek haber içermektedir ve her biri "real" (gerçek) veya "fake" (sahte) olarak etiketlenmiştir. Haber başlıkları ve içerikleri gerçek dünyayı taklit edecek şekilde oluşturulmuştur.

Bu veri seti sayesinde:

Logistic Regression, SVM, BERT gibi NLP modelleri eğitilebilir Metin üzerinde öznitelik mühendisliği yapılabilir Sahte haber tespiti üzerine sınıflandırma çalışmaları yürütülebilir Değişkenler:

title: Haber başlığı text: Haber içeriği label: Etiket (fake/real)

Original Data Source: Fake News Detection
Zoo animal classification
kaggle.com
zip
Updated Feb 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karthikeyan Raghav (2021). Zoo animal classification [Dataset]. https://www.kaggle.com/karthikeyanraghav/zoo-animal-classification
Explore at:
zip(1198 bytes)Available download formats
Dataset updated
Feb 27, 2021
Authors
Karthikeyan Raghav
Description
Dataset

This dataset was created by Karthikeyan Raghav

Contents
o
Spam Mail Classifier Dataset
opendatabay.com
.csv
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Spam Mail Classifier Dataset [Dataset]. https://www.opendatabay.com/data/dataset/9aa9a17e-1fe7-44f5-9fb0-f901c05b4a17
Explore at:
.csvAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Fraud Detection & Risk Management
Description
🇬🇧 English:

This dataset contains 1,000 synthetic email messages labeled as either spam or ham. It was created to help users build and evaluate text classification models using basic natural language processing (NLP) techniques.

Use this dataset to:

Train a spam filter using Naive Bayes, SVM, or Logistic Regression Practice text cleaning, tokenization, and TF-IDF vectorization Build email classification models without needing real personal email data 🇹🇷 Türkçe:

Bu veri seti, 1.000 adet sentetik e-posta mesajı içerir ve her bir mesaj spam ya da ham (normal) olarak etiketlenmiştir. Doğal dil işleme teknikleriyle spam tespiti modeli geliştirmek isteyenler için hazırlanmıştır.

Bu veri seti ile:

Naive Bayes, SVM gibi metin sınıflandırma modelleri geliştirilebilir Metin temizleme, tokenizasyon ve TF-IDF uygulamaları yapılabilir Gerçek e-postalara gerek kalmadan NLP pratiği yapılabilir

Original Data Source: Spam Mail Classifier Dataset
Data from: Iris Flower Classification
kaggle.com
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PavaniGardas (2023). Iris Flower Classification [Dataset]. https://www.kaggle.com/datasets/pavanigardas/iris-flower-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PavaniGardas
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Iris Flower Classification is a classic machine learning task used for learning and practicing classification algorithms. The dataset contains features like sepal length, sepal width, petal length, and petal width for three different species of iris flowers. This project involves data pre-processing, model selection, and evaluation. Here, we use classification algorithms like logistic regression, decision trees, k-nearest neighbors (KNN), or support vector machines (SVM) for this classification task.
American Sign Language Digit Dataset
kaggle.com
Updated Aug 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S M Rayeed (2021). American Sign Language Digit Dataset [Dataset]. https://www.kaggle.com/rayeed045/american-sign-language-digit-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M Rayeed
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

This is a American Sign Language Digits Dataset, from sign 0 to sign 9. This dataset uses depth information for generating hand key-points (using MediaPipe), which enriches the dataset and enhances the accuracy during classification.

Content

This is a American Sign Language Digits Dataset, using MediaPipe framework, which accurately detects the hand & 21 hand key-points from a raw RGB image, and stores the co-ordinate values of these key-points. The dataset contains 5000 such raw image files from sign 0 to sign 9 (500 files of each sign) and 5000 corresponding output image files (applying MediaPipe). After generating the dataset, we have also done the classification, using different classifiers, such as KNN, SVM, RFC, DTC, Neural Networks etc. Accuracies for different classifiers are yielded in the classification code (in code section).

Acknowledgements

A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures - A.L.C. Barczak, N.H. Reyes, M. Abastillas, A. Piccio and T. Susnjak
A
‘Dementia Prediction Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Dementia Prediction Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-dementia-prediction-dataset-8ab0/3d5e8806/?iid=009-768&v=presentation
Explore at:
Dataset updated
Aug 13, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Dementia Prediction Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shashwatwork/dementia-prediction-dataset on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Dementia is a syndrome – usually of a chronic or progressive nature – in which there is deterioration in cognitive function (i.e. the ability to process thought) beyond what might be expected from normal aging. It affects memory, thinking, orientation, comprehension, calculation, learning capacity, language, and judgment. Consciousness is not affected. The impairment in cognitive function is commonly accompanied and occasionally preceded, by deterioration in emotional control, social behaviou, or motivation.

Dementia results from a variety of diseases and injuries that primarily or secondarily affect the brain, such as Alzheimer's disease or stroke.

Dementia is one of the major causes of disability and dependency among older people worldwide. It can be overwhelming, not only for the people who have it, but also for their carers and families. There is often a lack of awareness and understanding of dementia, resulting in stigmatization and barriers to diagnosis and care. The impact of dementia on carers, family, and society at large can be physical, psychological, social and e and economic

Content

This set consists of a longitudinal collection of 150 subjects aged 60 to 96. Each subject was scanned on two or more visits, separated by at least one year for a total of 373 imaging sessions. For each subject, 3 or 4 individual T1-weighted MRI scans obtained in single scan sessions are included. The subjects are all right-handed and include both men and women. 72 of the subjects were characterized as nondemented throughout the study. 64 of the included subjects were characterized as demented at the time of their initial visits and remained so for subsequent scans, including 51 individuals with mild to moderate Alzheimer’s disease. Another 14 subjects were characterized as nondemented at the time of their initial visit and were subsequently characterized as demented at a later visit

Acknowledgements

Battineni, Gopi; Amenta, Francesco; Chintalapudi, Nalini (2019), “Data for: MACHINE LEARNING IN MEDICINE: CLASSIFICATION AND PREDICTION OF DEMENTIA BY SUPPORT VECTOR MACHINES (SVM)”, Mendeley Data, V1, doi: 10.17632/tsy6rbc5d4.1 * Dataset is available here.

--- Original source retains full ownership of the source dataset ---
💸 💳 Online Banking / Financial Review Dataset
kaggle.com
Updated Dec 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Maksi (2022). 💸 💳 Online Banking / Financial Review Dataset [Dataset]. https://www.kaggle.com/datasets/yanmaksi/reviews-data-for-classification-model/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yan Maksi
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This Dataset contains complete data on customer recalls for different banking companies, the data is not clean so before using it you will need to do exploratory data analysis for more complex models. If you are using simpler models you can simply take the column with the stars and the feedback. (You can see my example code with this dataset). Good luck @💯 !!!
BRAIN MRI 2021
kaggle.com
Updated Oct 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sankar (2021). BRAIN MRI 2021 [Dataset]. https://www.kaggle.com/datasets/rajalab/brain-mri-2021/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sankar
Description
Dataset

This dataset was created by Sankar

Contents
South African Powerball Results (Lottery)
kaggle.com
Updated May 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Teboho (2018). South African Powerball Results (Lottery) [Dataset]. https://www.kaggle.com/datasets/mosemet/south-african-powerball-results-lottery/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 19, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Teboho
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
South Africa
Description
Context

This is the South African Lottery results from year 2000 when it started to 2015. I was interested in predicting whether there will be winners or not given the following publicly available information prior to betting:

Prize Payable

Rollover

Rollover Count

Next Estimated Jackpot

The above mentioned features attract quite a lot of consumers and with an increase in the betters increase the chances of winning.

This classifier is able to achieve 98% score and correctly predict against the X_test set on whether there will be a division 1 jackpot winner or not. Winner is 1 and no-winner is 0.

The reason its 98% prediction is only because if there are 2 winners on division 1, it cannot predict this and hence if compared to the test set, it's not wholly accurate.

Content

The data was acquired from the National Lottery website. Please look at: https://www.nationallottery.co.za/lotto-history/?game=Lotto for further information

Acknowledgements

I am only new to machine learning, being a Chemical Engineer by vocation, I came across this sphere of knowledge and I must admit, most of my nights are spent just coding away and trying to predict the most ludicrous datasets I can dream up. However, its all been a lot of fun, and with every exercise I tend to learn a lot more.

Inspiration

One of my challenges is in visualising this data. I tried meshgrid and contourf plots, but getting errors. Also is it possible to to predict the number of division 1 winners? In the y_train data, there are a number of instances where there was more than 1 division 1 winners. However, the SVM was made only to be able to predict 0 for no winners or 1 for winners.
Data from: Pumpkin Seeds Dataset
kaggle.com
Updated Apr 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murat KOKLU (2022). Pumpkin Seeds Dataset [Dataset]. https://www.kaggle.com/datasets/muratkokludataset/pumpkin-seeds-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Murat KOKLU
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DATASET: https://www.muratkoklu.com/datasets/

Citation Request : KOKLU, M., SARIGIL, S., & OZBEK, O. (2021). The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.). Genetic Resources and Crop Evolution, 68(7), 2713-2726. Doi: https://doi.org/10.1007/s10722-021-01226-0

https://link.springer.com/article/10.1007/s10722-021-01226-0 https://link.springer.com/content/pdf/10.1007/s10722-021-01226-0.pdf

DATASET: https://www.muratkoklu.com/datasets/

Abstract: Pumpkin seeds are frequently consumed as confection worldwide because of their adequate amount of protein, fat, carbohydrate, and mineral contents. This study was carried out on the two most important and quality types of pumpkin seeds, ‘‘Urgup_Sivrisi’’ and ‘‘Cercevelik’’, generally grown in Urgup and Karacaoren regions in Turkey. However, morphological measurements of 2500 pumpkin seeds of both varieties were made possible by using the gray and binary forms of threshold techniques. Considering morphological features, all the data were modeled with five different machine learning methods: Logistic Regression (LR), Multilayer Perceptrons (MLP), Support Vector Machine (SVM) and Random Forest (RF), and k-Nearest Neighbor (k-NN), which further determined the most successful method for classifying pumpkin seed varieties. However, the performances of the models were determined with the help of the 10 kfold cross-validation method. The accuracy rates of the classifiers were obtained as LR 87.92 percent, MLP 88.52 percent, SVM 88.64 percent, RF 87.56 percent, and k-NN 87.64 percent.

Keywords Pumpkin seed Logistic regression, Multilayer peceptrons, Random forest, Classification, Support vector machine, Thresholding

DATASET: https://www.muratkoklu.com/datasets/
Banknote Authentication
kaggle.com
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MasterShomya (2025). Banknote Authentication [Dataset]. https://www.kaggle.com/datasets/mastershomya/banknote-authetication/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MasterShomya
Description
This dataset helps in identifying counterfeit banknotes based on statistical features extracted from genuine and forged currency notes. It contains attributes such as variance, skewness, and entropy, which are derived from images of banknotes using wavelet transformation.

Dataset Details:

Task: Classify banknotes as real or fake

Features:

Variance of Wavelet Transformed Image

Skewness of Wavelet Transformed Image

Curtosis of Wavelet Transformed Image

Entropy of the Image

Target: Binary classification (0 = Fake, 1 = Real)

Source: UCI Machine Learning Repository

This dataset is widely used for classification tasks and ML model evaluation in fraud detection.
seed_dataset
kaggle.com
Updated Jan 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hari narayanan R (2023). seed_dataset [Dataset]. https://www.kaggle.com/datasets/harinarayanan22/seed-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hari narayanan R
Description
dataset : Pumpkin seeds are frequently consumed as confection worldwide because of their adequate amount of protein, fat, carbohydrate, and mineral contents. This study was carried out on the two most important and quality types of pumpkin seeds, ‘‘Urgup_Sivrisi’’ and ‘‘Cercevelik’’, generally grown in Urgup and Karacaoren regions in Turkey. However, morphological measurements of 2500 pumpkin seeds of both varieties were made possible by using the gray and binary forms of threshold techniques. Considering morphological features, all the data were modeled with five different machine learning methods: Logistic Regression (LR), Multilayer Perceptrons (MLP), Support Vector Machine (SVM) and Random Forest (RF), and k-Nearest Neighbor (k-NN), which further determined the most successful method for classifying pumpkin seed varieties. However, the performances of the models were determined with the help of the 10 kfold cross-validation method. The accuracy rates of the classifiers were obtained as LR 87.92 percent, MLP 88.52 percent, SVM 88.64 percent, RF 87.56 percent, and k-NN 87.64 percent.

Keywords Pumpkin seed Logistic regression, Multilayer peceptrons, Random forest, Classification, Support vector machine, Thresholding
Bangla Sign Language Dataset
kaggle.com
Updated Aug 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S M Rayeed (2021). Bangla Sign Language Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/2508666
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/2508666
Dataset updated
Aug 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M Rayeed
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Bangla Sign Language Dataset using Depth Information

This is a Bangla Sign Language Dataset, using MediaPipe framework, which accurately detects the hand & 21 hand key-points from a raw RGB image, and stores the co-ordinate values of these key-points. After collecting 47000 such raw image files for 47 signs (100 files per sign per user) and generating 47000 corresponding output image files applying MediaPipe, the co-ordinate values of these key-points are stored in a .csv files. This dataset contains 470 such .csv files (collected from 10 users for 47 signs in total). After generating the dataset, we have also done the classification, using different classifiers, such as KNN, SVM, RFC, DTC, Neural Networks etc. Accuracies for different classifiers are yielded in the classification code (in code section).
Data from: Chestnut Varieties Dataset
kaggle.com
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Yurdakul (2025). Chestnut Varieties Dataset [Dataset]. https://www.kaggle.com/datasets/mahyeks/chestnut-varieties-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Yurdakul
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📄 Description: This dataset consists of 1,156 images of four major chestnut (Castanea sativa) varieties cultivated in Turkey: Alandız, Aydın, Simav, and Zonguldak. Images were captured under controlled lighting conditions using a Samsung NX300 camera, from both front and back angles to ensure diversity. Each folder in the dataset corresponds to a specific chestnut variety.

The dataset has been used in multiple academic studies and is suitable for developing and testing image classification algorithms, deep learning models, and computer vision systems in agriculture and food technology.

📚 Citation Request: If you use this dataset in your research or application, cite the following studies:

Yurdakul, M., Uyar, K., & Taşdemir, Ş. Webserver-Based Mobile Application for Multi-class Chestnut (Castanea sativa) Classification Using Deep Features and Attention Mechanisms, Applied Fruit Science, 2025, 67:102. Springer DOI: https://doi.org/10.1007/s10341-025-01327-5

Yurdakul, M., Atabaş, İ., & Taşdemir, Ş. (2024, March). Chestnut (Castanea Sativa) Varieties Classification with Harris Hawks Optimization based Selected Features and SVM. In 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS) (pp. 1-5). IEEE.

🧾 Folder Structure: 📁 alandız – 272 images

📁 aydın – 228 images

📁 simav – 304 images

📁 zonguldak – 352 images

All images are in .jpg format and represent single chestnuts from different angles.

🧠 Potential Use Cases: Image classification

Machine learning & deep learning model development

Feature selection and optimization benchmarking

Agricultural and food product recognition
Classification: Persistent vs Non-Persistent
kaggle.com
Updated May 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harbhajan Singh (2021). Classification: Persistent vs Non-Persistent [Dataset]. https://www.kaggle.com/harbhajansingh21/persistent-vs-nonpersistent/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Harbhajan Singh
License
http://www.gnu.org/licenses/fdl-1.3.htmlhttp://www.gnu.org/licenses/fdl-1.3.html
Description
This dataset provides valuable insights into the persistency of drug prescriptions in the pharmaceutical industry. By analyzing various factors, we aim to build a classification model to understand the factors influencing persistency. The dataset includes patient information, provider attributes, clinical factors, and disease/treatment factors. The challenge is to uncover patterns and relationships that impact persistency. This analysis will aid pharmaceutical companies in optimizing their strategies and improving patient outcomes. Join me in exploring this dataset and leveraging machine learning techniques to tackle this important problem. Let's dive in and unlock the secrets of drug persistency!

Problem Statement

One of the challenge for all Pharmaceutical companies is to understand the persistency of drug as per the physician prescription.

With an objective to gather insights on the factors that are impacting the persistency, build your own classification model.

Variable Description

Here I'm describing the columns in detail:

Variable: Variable Description Patient ID: Unique ID of each patient Persistency_Flag: Flag indicating if a patient was persistent or not Age: Age of the patient during their therapy Race: Race of the patient from the patient table Region: Region of the patient from the patient table Ethnicity: Ethnicity of the patient from the patient table Gender: Gender of the patient from the patient table IDN Indicator: Flag indicating patients mapped to IDN

Provider Attributes

NTM - Physician Specialty: Specialty of the HCP that prescribed the NTM Rx

Clinical Factors

NTM - T-Score: T Score of the patient at the time of the NTM Rx (within 2 years prior from rxdate) Change in T Score: Change in Tscore before starting with any therapy and after receiving therapy (Worsened, Remained Same, Improved, Unknown) NTM - Risk Segment: Risk Segment of the patient at the time of the NTM Rx (within 2 years days prior from rxdate) Change in Risk Segment: Change in Risk Segment before starting with any therapy and after receiving therapy (Worsened, Remained Same, Improved, Unknown) NTM - Multiple Risk Factors: Flag indicating if patient falls under multiple risk category (having more than 1 risk) at the time of the NTM Rx (within 365 days prior from rxdate) NTM - Dexa Scan Frequency: Number of DEXA scans taken prior to the first NTM Rx date (within 365 days prior from rxdate) NTM - Dexa Scan Recency: Flag indicating the presence of Dexa Scan before the NTM Rx (within 2 years prior from rxdate or between their first Rx and Switched Rx; whichever is smaller and applicable) Dexa During Therapy: Flag indicating if the patient had a Dexa Scan during their first continuous therapy NTM - Fragility Fracture Recency: Flag indicating if the patient had a recent fragility fracture (within 365 days prior from rxdate) Fragility Fracture During Therapy: Flag indicating if the patient had fragility fracture during their first continuous therapy NTM - Glucocorticoid Recency: Flag indicating usage of Glucocorticoids (>=7.5mg strength) in the one year look-back from the first NTM Rx Glucocorticoid During Therapy: Flag indicating if the patient had a Glucocorticoid usage during the first continuous therapy

Disease/Treatment Factor

NTM - Injectable Experience: Flag indicating any injectable drug usage in the recent 12 months before the NTM OP Rx NTM - Risk Factors: Risk Factors that the patient is falling into. For chronic Risk Factors complete lookback to be applied and for non-chronic Risk Factors, one year lookback from the date of first OP Rx NTM - Comorbidity: Comorbidities are divided into two main categories - Acute and chronic, based on the ICD codes. For chronic disease we are taking complete look back from the first Rx date of NTM therapy and for acute diseases, time period before the NTM OP Rx with one year lookback has been applied NTM - Concomitancy: Concomitant drugs recorded prior to starting with a therapy(within 365 days prior from first rxdate) Adherence: Adherence for the therapies

Inspiration

This is my first datasets in the Kaggle. Hope you will learn and make more notebooks from this datasets. If you learn something from this datasets then don't forget to upvote it.
Bank Credit Approval Dataset
kaggle.com
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Şahide ŞEKER (2025). Bank Credit Approval Dataset [Dataset]. https://www.kaggle.com/datasets/sahideseker/bank-credit-approval-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Şahide ŞEKER
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🇬🇧 English:

This synthetic dataset was created to simulate a typical bank credit approval process. It includes 1,000 applicant records with relevant financial and demographic details such as age, income, credit score, employment status, and requested loan amount. A final approved column indicates whether the credit application was accepted.

Use this dataset to:

Train and evaluate classification models such as Logistic Regression, SVM, XGBoost

Explore the impact of income, credit score, and employment status on approval decisions

Practice real-world financial modeling without accessing private data

🇹🇷 Türkçe:

Bu sentetik veri seti, bir bankanın kredi başvuru sürecini modellemek amacıyla oluşturulmuştur. 1.000 başvuru sahibine ait yaş, gelir, kredi puanı, istihdam durumu ve talep edilen kredi tutarı gibi bilgiler yer almaktadır. approved sütunu ise başvurunun onaylanıp onaylanmadığını belirtir.

Bu veri seti sayesinde:

Logistic Regression, SVM, XGBoost gibi sınıflandırma modelleri eğitilebilir

Onay kararlarını etkileyen faktörler analiz edilebilir

Finansal modelleme becerileri geliştirilebilir

Facebook

Twitter

Click to copy link

Link copied

Cite

chinthakindi vinod (2019). SVM Classification [Dataset]. https://www.kaggle.com/vinod00725/svm-classification/activity

SVM Classification

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 28, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

chinthakindi vinod

Description

Dataset

This dataset was created by chinthakindi vinod

Clear search

Close search

Google apps

Main menu

SVM Classification

Dataset

Contents

Predict the classification group

Dataset

Contents

Ad Click Prediction - Classification Problem

text classifier svm

Dataset

Contents

Fake News Detection

Zoo animal classification

Dataset

Contents

Spam Mail Classifier Dataset

Data from: Iris Flower Classification

American Sign Language Digit Dataset

Context

Content

Acknowledgements

‘Dementia Prediction Dataset’ analyzed by Analyst-2

Context

Content

Acknowledgements

💸 💳 Online Banking / Financial Review Dataset

BRAIN MRI 2021

Dataset

Contents

South African Powerball Results (Lottery)

Context

Content

Acknowledgements

Inspiration

Data from: Pumpkin Seeds Dataset

Banknote Authentication

seed_dataset

Bangla Sign Language Dataset

Bangla Sign Language Dataset using Depth Information

Data from: Chestnut Varieties Dataset

Classification: Persistent vs Non-Persistent

Problem Statement

Variable Description

Provider Attributes

Clinical Factors

Disease/Treatment Factor

Inspiration

Bank Credit Approval Dataset

SVM Classification

Dataset

Contents