100+ datasets found

Knn algorithms
kaggle.com
zip
Updated May 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piyush Borhade (2023). Knn algorithms [Dataset]. https://www.kaggle.com/datasets/piyushborhade/knn-algorithms
Explore at:
zip(1752 bytes)Available download formats
Dataset updated
May 12, 2023
Authors
Piyush Borhade
Description
KNN Algorithm is used to find the class of point by the class of nearest neighbour.

KNN Algorithm can be used for both classification as well as Regression! but here we will be using to solve Classification problem.

Here, in the dataset, We are having 4 features which are Gender, Age, Salary, Purchase Iphone.
r
KNN data
redivis.com
Updated Jun 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Impact Data Collaborative (2022). KNN data [Dataset]. https://redivis.com/datasets/kqpk-a1jj1pen4
Explore at:
Dataset updated
Jun 21, 2022
Dataset authored and provided by
Environmental Impact Data Collaborative
Description
The table KNN data is part of the dataset Extended GHCNd Station Coverage (KNN algorithm), available at https://redivis.com/datasets/kqpk-a1jj1pen4. It contains 2867 rows across 4 variables.
KNN Algorithm Dataset
kaggle.com
zip
Updated Jul 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gökalp Olukcu (2020). KNN Algorithm Dataset [Dataset]. https://www.kaggle.com/datasets/gkalpolukcu/knn-algorithm-dataset/code
Explore at:
zip(49826 bytes)Available download formats
Dataset updated
Jul 5, 2020
Authors
Gökalp Olukcu
Description
Dataset

This dataset was created by Gökalp Olukcu

Released under Data files © Original Authors

Contents
KNN DATASET
kaggle.com
zip
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratyush_Ranjan (2023). KNN DATASET [Dataset]. https://www.kaggle.com/datasets/pratyushranjan01/knn-dataset/data
Explore at:
zip(59421 bytes)Available download formats
Dataset updated
Jul 5, 2023
Authors
Pratyush_Ranjan
Description
K-Nearest Neighbors (KNN) is a popular supervised learning algorithm used for classification and regression tasks. It is a non-parametric algorithm that makes predictions based on the similarity between input features and their neighboring data points.

In the context of a KNN dataset, it typically refers to a dataset that is suitable for applying the KNN algorithm. Here are a few characteristics of a dataset that can work well with KNN:

Numerical features: KNN works with numerical features, so the dataset should contain numerical attributes. If categorical features are present, they need to be converted into numerical representations through techniques like one-hot encoding or label encoding.

Similarity measure: KNN relies on a distance metric to determine the similarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. The dataset should have features that can be effectively compared using a distance metric.

Feature scaling: Since KNN uses distance calculations, it's generally a good practice to scale the features. Features with larger scales can dominate the distance calculations and lead to biased results. Common scaling techniques include standardization (subtracting mean and dividing by standard deviation) or normalization (scaling values to a range, e.g., 0 to 1).

Sufficient data points: KNN performs best when the dataset has a sufficient number of data points for each class or target value. Having too few instances per class can lead to overfitting or inaccurate predictions.

It's important to note that the suitability of a dataset for KNN depends on the specific problem and domain. It's always recommended to analyze and preprocess the dataset based on its characteristics before applying any machine learning algorithm, including KNN.
Extended GHCNd Station Coverage (KNN algorithm)
redivis.com
application/jsonl +7
Updated Jun 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental Impact Data Collaborative (2022). Extended GHCNd Station Coverage (KNN algorithm) [Dataset]. https://redivis.com/datasets/kqpk-a1jj1pen4
Explore at:
application/jsonl, sas, csv, avro, spss, parquet, arrow, stataAvailable download formats
Dataset updated
Jun 16, 2022
Dataset provided by
Redivis Inc.
Authors
Environmental Impact Data Collaborative
Description
Abstract

This dataset contains extended weather station coverage data by conducting a naive KNN algorithm.
f
Actual classification performance for Promoter dataset using KNN classifier....
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos; Kleftogiannis, Dimitrios (2015). Actual classification performance for Promoter dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924549
Explore at:
Dataset updated
Feb 26, 2015
Authors
Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos; Kleftogiannis, Dimitrios
Description
Actual classification performance for Promoter dataset using KNN classifier.
f
Actual classification performance for WDBC dataset using KNN classifier.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kleftogiannis, Dimitrios; Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos (2015). Actual classification performance for WDBC dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924546
Explore at:
Dataset updated
Feb 26, 2015
Authors
Kleftogiannis, Dimitrios; Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos
Description
Actual classification performance for WDBC dataset using KNN classifier.
KNN model
kaggle.com
zip
Updated Apr 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
george saavedra (2021). KNN model [Dataset]. https://www.kaggle.com/datasets/georgesaavedra/knn-model
Explore at:
zip(184622 bytes)Available download formats
Dataset updated
Apr 26, 2021
Authors
george saavedra
Description
Dataset

This dataset was created by george saavedra

Contents
Description of datasets used for evaluation and comparison.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad B. A. Hassanat (2023). Description of datasets used for evaluation and comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0207772.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0207772.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Ahmad B. A. Hassanat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description of datasets used for evaluation and comparison.
h
kNN-Targets-wikipedia-mistral
huggingface.co
Updated Oct 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rubin Wei (2025). kNN-Targets-wikipedia-mistral [Dataset]. https://huggingface.co/datasets/Rubin-Wei/kNN-Targets-wikipedia-mistral
Explore at:
Dataset updated
Oct 26, 2025
Authors
Rubin Wei
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Overview

This dataset provides k-nearest neighbor (kNN) target distributions for language modeling. Each token in the Wikipedia corpus is associated with a soft probability distribution over its top-k nearest neighbors in the representation space of a frozen language model. These targets can be used to train MLP Memory.

Corresponding Preprocessed Corpus: Rubin-Wei/enwiki-dec2021-preprocessed-mistral Compatible Model: Mistral-7B-v0.3 Paper: MLP Memory: A Retriever-Pretrained… See the full description on the dataset page: https://huggingface.co/datasets/Rubin-Wei/kNN-Targets-wikipedia-mistral.
f
Actual classification performance for miRNA dataset using KNN classifier.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 26, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Soufan, Othman; Kalnis, Panos; Kleftogiannis, Dimitrios; Bajic, Vladimir B. (2015). Actual classification performance for miRNA dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924567
Explore at:
Dataset updated
Feb 26, 2015
Authors
Soufan, Othman; Kalnis, Panos; Kleftogiannis, Dimitrios; Bajic, Vladimir B.
Description
Actual classification performance for miRNA dataset using KNN classifier.
f
Data from: Trade-off Predictivity and Explainability for Machine-Learning...
acs.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leihong Wu; Ruili Huang; Igor V. Tetko; Zhonghua Xia; Joshua Xu; Weida Tong (2023). Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets [Dataset]. http://doi.org/10.1021/acs.chemrestox.0c00373.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.chemrestox.0c00373.s002
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Leihong Wu; Ruili Huang; Igor V. Tetko; Zhonghua Xia; Joshua Xu; Weida Tong
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model’s performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
Titanic Survival Prediction K-NearestNeighbors KNN
kaggle.com
zip
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rakesh V R (2023). Titanic Survival Prediction K-NearestNeighbors KNN [Dataset]. https://www.kaggle.com/datasets/rakeshravindra/titanic-ml-from-disaster
Explore at:
zip(33847 bytes)Available download formats
Dataset updated
Dec 22, 2023
Authors
Rakesh V R
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
I conducted my analysis using the Titanic dataset from Kaggle's 'Titanic: Machine Learning from Disaster' competition. This dataset includes information about passengers on the RMS Titanic, such as their demographics, ticket class, and whether they survived or not. The dataset is commonly used for predictive modeling tasks, and my goal was to apply machine learning techniques to predict passenger survival based on various features.
d
Data from: Data release for: Evaluating k-nearest neighbor (kNN) imputation...
catalog.data.gov
data.usgs.gov
Updated Nov 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for: Evaluating k-nearest neighbor (kNN) imputation models for species-level aboveground forest biomass mapping in Northeast China [Dataset]. https://catalog.data.gov/dataset/data-release-for-evaluating-k-nearest-neighbor-knn-imputation-models-for-species-level-abo
Explore at:
Dataset updated
Nov 26, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Northeast China
Description
Quantifying spatially explicit or pixel-level aboveground forest biomass (AFB) across large regions is critical for measuring forest carbon sequestration capacity, assessing forest carbon balance, and revealing changes in the structure and function of forest ecosystems. When AFB is measured at the species level using widely available remote sensing data, regional changes in forest composition can readily be monitored. In this study, wall-to-wall maps of species-level AFB were generated for forests in Northeast China by integrating forest inventory data with Moderate Resolution Imaging Spectroradiometer (MODIS) images and environmental variables through applying the optimal k-nearest neighbor (kNN) imputation model. By comparing the prediction accuracy of 630 kNN models, we found that the models with random forest (RF) as the distance metric showed the highest accuracy. Compared to the use of single-month MODIS data for September, there was no appreciable improvement for the estimation accuracy of species-level AFB by using multi-month MODIS data. When k > 7, the accuracy improvement of the RF-based kNN models using the single MODIS predictors for September was essentially negligible. Therefore, the kNN model using the RF distance metric, single-month (September) MODIS predictors and k = 7 was the optimal model to impute the species-level AFB for entire Northeast China. Our imputation results showed that average AFB of all species over Northeast China was 101.98 Mg/ha around 2000. Among 17 widespread species, larch was most dominant, with the largest AFB (20.88 Mg/ha), followed by white birch (13.84 Mg/ha). Amur corktree and willow had low AFB (0.91 and 0.96 Mg/ha, respectively). Environmental variables (e.g., climate and topography) had strong relationships with species-level AFB. By integrating forest inventory data and remote sensing data with complete spatial coverage using the optimal kNN model, we successfully mapped the AFB distribution of the 17 tree species over Northeast China. We also evaluated the accuracy of AFB at different spatial scales. The AFB estimation accuracy significantly improved from stand level up to the ecotype level, indicating that the AFB maps generated from this study are more suitable to apply to forest ecosystem models (e.g., LINKAGES) which require species-level attributes at the ecotype scale.
KNN Model (Job Title)
kaggle.com
zip
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abu Noman Md. Sakib (2023). KNN Model (Job Title) [Dataset]. https://www.kaggle.com/datasets/anmspro/knn-model-job-title/discussion
Explore at:
zip(1409669 bytes)Available download formats
Dataset updated
Mar 28, 2023
Authors
Abu Noman Md. Sakib
Description
Dataset

This dataset was created by Abu Noman Md. Sakib

Contents
f
Few-shot learning classification accuracy on 14 medical datasets with the...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cai, Xiaohao; Liu, Jiahui; Fan, Keqiang; Niranjan, Mahesan (2024). Few-shot learning classification accuracy on 14 medical datasets with the KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424309
Explore at:
Dataset updated
Nov 6, 2024
Authors
Cai, Xiaohao; Liu, Jiahui; Fan, Keqiang; Niranjan, Mahesan
Description
Few-shot learning classification accuracy on 14 medical datasets with the KNN classifier.
Dataset For kNN
kaggle.com
zip
Updated Dec 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mit Gandhi (2024). Dataset For kNN [Dataset]. https://www.kaggle.com/datasets/mitgandhi10/dataset-for-knn
Explore at:
zip(4856 bytes)Available download formats
Dataset updated
Dec 25, 2024
Authors
Mit Gandhi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains information about car evaluations based on various features like buying price, maintenance cost, number of doors, capacity of persons, luggage boot size, and safety rating. It also includes the class label indicating the overall evaluation of the car.

The dataset is ideal for machine learning classification tasks, especially for understanding the impact of categorical and numerical features on car evaluations.

Key Points:

Total Rows: 1,728 (Example, replace with actual row count) Total Columns: 7 Categorical Features: buying, maint, lug_boot, safety, class Numerical Features: doors, persons Objective: Predict the class of a car based on the given features.
Hyperparameter settings of BERT model.
plos.figshare.com
xls
Updated Oct 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lu Xiao; Qiaoxing Li; Qian Ma; Jiasheng Shen; Yong Yang; Danyang Li (2024). Hyperparameter settings of BERT model. [Dataset]. http://doi.org/10.1371/journal.pone.0305095.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305095.t007
Dataset updated
Oct 18, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Lu Xiao; Qiaoxing Li; Qian Ma; Jiasheng Shen; Yong Yang; Danyang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Text classification, as an important research area of text mining, can quickly and effectively extract valuable information to address the challenges of organizing and managing large-scale text data in the era of big data. Currently, the related research on text classification tends to focus on the application in fields such as information filtering, information retrieval, public opinion monitoring, and library and information, with few studies applying text classification methods to the field of tourist attractions. In light of this, a corpus of tourist attraction description texts is constructed using web crawler technology in this paper. We propose a novel text representation method that combines Word2Vec word embeddings with TF-IDF-CRF-POS weighting, optimizing traditional TF-IDF by incorporating total relative term frequency, category discriminability, and part-of-speech information. Subsequently, the proposed algorithm respectively combines seven commonly used classifiers (DT, SVM, LR, NB, MLP, RF, and KNN), known for their good performance, to achieve multi-class text classification for six subcategories of national A-level tourist attractions. The effectiveness and superiority of this algorithm are validated by comparing the overall performance, specific category performance, and model stability against several commonly used text representation methods. The results demonstrate that the newly proposed algorithm achieves higher accuracy and F1-measure on this type of professional dataset, and even outperforms the high-performance BERT classification model currently favored by the industry. Acc, marco-F1, and mirco-F1 values are respectively 2.29%, 5.55%, and 2.90% higher. Moreover, the algorithm can identify rare categories in the imbalanced dataset and exhibit better stability across datasets of different sizes. Overall, the algorithm presented in this paper exhibits superior classification performance and robustness. In addition, the conclusions obtained by the predicted value and the true value are consistent, indicating that this algorithm is practical. The professional domain text dataset used in this paper poses higher challenges due to its complexity (uneven text length, relatively imbalanced categories), and a high degree of similarity between categories. However, this proposed algorithm can efficiently implement the classification of multiple subcategories of this type of text set, which is a beneficial exploration of the application research of complex Chinese text datasets in specific fields, and provides a useful reference for the vector expression and classification of text datasets with similar content.
f
Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with...
plos.figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seoung Bum Kim; Jung Woo Lee; Sin Young Kim; Deok Won Lee (2023). Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with different numbers of features. [Dataset]. http://doi.org/10.1371/journal.pone.0067862.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0067862.t004
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS ONE
Authors
Seoung Bum Kim; Jung Woo Lee; Sin Young Kim; Deok Won Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Average standard errors from 1,000 experiments are shown inside the parentheses; boldface values indicate in each dataset the KNN models with minimum error rates.
h
P-KNN
huggingface.co
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brandes Lab @ NYU (2025). P-KNN [Dataset]. https://huggingface.co/datasets/brandeslab/P-KNN
Explore at:
Dataset updated
Aug 11, 2025
Dataset authored and provided by
Brandes Lab @ NYU
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
P-KNN Precomputed Scores Dataset

This dataset provides precomputed pathogenicity prediction scores generated by the P-KNN method using dbNSFP v5.2 (academic or commercial version) with joint calibration.It contains pathogenicity assessments for all missense variants, organized into multiple subfolders.

Dataset Structure 1. precomputed_score_academic_chromosome

Includes precomputed scores derived from the academic version of dbNSFP v5.2a, organized by genomic… See the full description on the dataset page: https://huggingface.co/datasets/brandeslab/P-KNN.

Facebook

Twitter

Click to copy link

Link copied

Cite

Piyush Borhade (2023). Knn algorithms [Dataset]. https://www.kaggle.com/datasets/piyushborhade/knn-algorithms

Knn algorithms

Great csv for Beginners who want to practice KNN algorithm

Explore at:

zip(1752 bytes)Available download formats

Dataset updated

May 12, 2023

Authors

Piyush Borhade

Description

KNN Algorithm is used to find the class of point by the class of nearest neighbour.

KNN Algorithm can be used for both classification as well as Regression! but here we will be using to solve Classification problem.

Here, in the dataset, We are having 4 features which are Gender, Age, Salary, Purchase Iphone.

Clear search

Close search

Google apps

Main menu

Knn algorithms

KNN data

KNN Algorithm Dataset

Dataset

Contents

KNN DATASET

Extended GHCNd Station Coverage (KNN algorithm)

Abstract

Actual classification performance for Promoter dataset using KNN classifier....

Actual classification performance for WDBC dataset using KNN classifier.

KNN model

Dataset

Contents

Description of datasets used for evaluation and comparison.

kNN-Targets-wikipedia-mistral

Actual classification performance for miRNA dataset using KNN classifier.

Data from: Trade-off Predictivity and Explainability for Machine-Learning...

Titanic Survival Prediction K-NearestNeighbors KNN

Data from: Data release for: Evaluating k-nearest neighbor (kNN) imputation...

KNN Model (Job Title)

Dataset

Contents

Few-shot learning classification accuracy on 14 medical datasets with the...

Dataset For kNN

Hyperparameter settings of BERT model.

Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with...

P-KNN

Knn algorithms

Great csv for Beginners who want to practice KNN algorithm