100+ datasets found
  1. Knn algorithms

    • kaggle.com
    zip
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piyush Borhade (2023). Knn algorithms [Dataset]. https://www.kaggle.com/datasets/piyushborhade/knn-algorithms
    Explore at:
    zip(1752 bytes)Available download formats
    Dataset updated
    May 12, 2023
    Authors
    Piyush Borhade
    Description

    KNN Algorithm is used to find the class of point by the class of nearest neighbour.

    KNN Algorithm can be used for both classification as well as Regression! but here we will be using to solve Classification problem.

    Here, in the dataset, We are having 4 features which are Gender, Age, Salary, Purchase Iphone.

  2. r

    KNN data

    • redivis.com
    Updated Jun 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Impact Data Collaborative (2022). KNN data [Dataset]. https://redivis.com/datasets/kqpk-a1jj1pen4
    Explore at:
    Dataset updated
    Jun 21, 2022
    Dataset authored and provided by
    Environmental Impact Data Collaborative
    Description

    The table KNN data is part of the dataset Extended GHCNd Station Coverage (KNN algorithm), available at https://redivis.com/datasets/kqpk-a1jj1pen4. It contains 2867 rows across 4 variables.

  3. KNN Algorithm Dataset

    • kaggle.com
    zip
    Updated Jul 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gökalp Olukcu (2020). KNN Algorithm Dataset [Dataset]. https://www.kaggle.com/datasets/gkalpolukcu/knn-algorithm-dataset/code
    Explore at:
    zip(49826 bytes)Available download formats
    Dataset updated
    Jul 5, 2020
    Authors
    Gökalp Olukcu
    Description

    Dataset

    This dataset was created by Gökalp Olukcu

    Released under Data files © Original Authors

    Contents

  4. KNN DATASET

    • kaggle.com
    zip
    Updated Jul 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratyush_Ranjan (2023). KNN DATASET [Dataset]. https://www.kaggle.com/datasets/pratyushranjan01/knn-dataset/data
    Explore at:
    zip(59421 bytes)Available download formats
    Dataset updated
    Jul 5, 2023
    Authors
    Pratyush_Ranjan
    Description

    K-Nearest Neighbors (KNN) is a popular supervised learning algorithm used for classification and regression tasks. It is a non-parametric algorithm that makes predictions based on the similarity between input features and their neighboring data points.

    In the context of a KNN dataset, it typically refers to a dataset that is suitable for applying the KNN algorithm. Here are a few characteristics of a dataset that can work well with KNN:

    Numerical features: KNN works with numerical features, so the dataset should contain numerical attributes. If categorical features are present, they need to be converted into numerical representations through techniques like one-hot encoding or label encoding.

    Similarity measure: KNN relies on a distance metric to determine the similarity between data points. Common distance measures include Euclidean distance, Manhattan distance, and cosine similarity. The dataset should have features that can be effectively compared using a distance metric.

    Feature scaling: Since KNN uses distance calculations, it's generally a good practice to scale the features. Features with larger scales can dominate the distance calculations and lead to biased results. Common scaling techniques include standardization (subtracting mean and dividing by standard deviation) or normalization (scaling values to a range, e.g., 0 to 1).

    Sufficient data points: KNN performs best when the dataset has a sufficient number of data points for each class or target value. Having too few instances per class can lead to overfitting or inaccurate predictions.

    It's important to note that the suitability of a dataset for KNN depends on the specific problem and domain. It's always recommended to analyze and preprocess the dataset based on its characteristics before applying any machine learning algorithm, including KNN.

  5. Extended GHCNd Station Coverage (KNN algorithm)

    • redivis.com
    application/jsonl +7
    Updated Jun 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Impact Data Collaborative (2022). Extended GHCNd Station Coverage (KNN algorithm) [Dataset]. https://redivis.com/datasets/kqpk-a1jj1pen4
    Explore at:
    application/jsonl, sas, csv, avro, spss, parquet, arrow, stataAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Redivis Inc.
    Authors
    Environmental Impact Data Collaborative
    Description

    Abstract

    This dataset contains extended weather station coverage data by conducting a naive KNN algorithm.

  6. f

    Actual classification performance for Promoter dataset using KNN classifier....

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos; Kleftogiannis, Dimitrios (2015). Actual classification performance for Promoter dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924549
    Explore at:
    Dataset updated
    Feb 26, 2015
    Authors
    Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos; Kleftogiannis, Dimitrios
    Description

    Actual classification performance for Promoter dataset using KNN classifier.

  7. f

    Actual classification performance for WDBC dataset using KNN classifier.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kleftogiannis, Dimitrios; Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos (2015). Actual classification performance for WDBC dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924546
    Explore at:
    Dataset updated
    Feb 26, 2015
    Authors
    Kleftogiannis, Dimitrios; Soufan, Othman; Bajic, Vladimir B.; Kalnis, Panos
    Description

    Actual classification performance for WDBC dataset using KNN classifier.

  8. KNN model

    • kaggle.com
    zip
    Updated Apr 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    george saavedra (2021). KNN model [Dataset]. https://www.kaggle.com/datasets/georgesaavedra/knn-model
    Explore at:
    zip(184622 bytes)Available download formats
    Dataset updated
    Apr 26, 2021
    Authors
    george saavedra
    Description

    Dataset

    This dataset was created by george saavedra

    Contents

  9. Description of datasets used for evaluation and comparison.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmad B. A. Hassanat (2023). Description of datasets used for evaluation and comparison. [Dataset]. http://doi.org/10.1371/journal.pone.0207772.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ahmad B. A. Hassanat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description of datasets used for evaluation and comparison.

  10. h

    kNN-Targets-wikipedia-mistral

    • huggingface.co
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rubin Wei (2025). kNN-Targets-wikipedia-mistral [Dataset]. https://huggingface.co/datasets/Rubin-Wei/kNN-Targets-wikipedia-mistral
    Explore at:
    Dataset updated
    Oct 26, 2025
    Authors
    Rubin Wei
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Overview

    This dataset provides k-nearest neighbor (kNN) target distributions for language modeling. Each token in the Wikipedia corpus is associated with a soft probability distribution over its top-k nearest neighbors in the representation space of a frozen language model. These targets can be used to train MLP Memory.

    Corresponding Preprocessed Corpus: Rubin-Wei/enwiki-dec2021-preprocessed-mistral Compatible Model: Mistral-7B-v0.3 Paper: MLP Memory: A Retriever-Pretrained… See the full description on the dataset page: https://huggingface.co/datasets/Rubin-Wei/kNN-Targets-wikipedia-mistral.

  11. f

    Actual classification performance for miRNA dataset using KNN classifier.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 26, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Soufan, Othman; Kalnis, Panos; Kleftogiannis, Dimitrios; Bajic, Vladimir B. (2015). Actual classification performance for miRNA dataset using KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001924567
    Explore at:
    Dataset updated
    Feb 26, 2015
    Authors
    Soufan, Othman; Kalnis, Panos; Kleftogiannis, Dimitrios; Bajic, Vladimir B.
    Description

    Actual classification performance for miRNA dataset using KNN classifier.

  12. f

    Data from: Trade-off Predictivity and Explainability for Machine-Learning...

    • acs.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leihong Wu; Ruili Huang; Igor V. Tetko; Zhonghua Xia; Joshua Xu; Weida Tong (2023). Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets [Dataset]. http://doi.org/10.1021/acs.chemrestox.0c00373.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Leihong Wu; Ruili Huang; Igor V. Tetko; Zhonghua Xia; Joshua Xu; Weida Tong
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model’s performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.

  13. Titanic Survival Prediction K-NearestNeighbors KNN

    • kaggle.com
    zip
    Updated Dec 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakesh V R (2023). Titanic Survival Prediction K-NearestNeighbors KNN [Dataset]. https://www.kaggle.com/datasets/rakeshravindra/titanic-ml-from-disaster
    Explore at:
    zip(33847 bytes)Available download formats
    Dataset updated
    Dec 22, 2023
    Authors
    Rakesh V R
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    I conducted my analysis using the Titanic dataset from Kaggle's 'Titanic: Machine Learning from Disaster' competition. This dataset includes information about passengers on the RMS Titanic, such as their demographics, ticket class, and whether they survived or not. The dataset is commonly used for predictive modeling tasks, and my goal was to apply machine learning techniques to predict passenger survival based on various features.

  14. d

    Data from: Data release for: Evaluating k-nearest neighbor (kNN) imputation...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Data release for: Evaluating k-nearest neighbor (kNN) imputation models for species-level aboveground forest biomass mapping in Northeast China [Dataset]. https://catalog.data.gov/dataset/data-release-for-evaluating-k-nearest-neighbor-knn-imputation-models-for-species-level-abo
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Northeast China
    Description

    Quantifying spatially explicit or pixel-level aboveground forest biomass (AFB) across large regions is critical for measuring forest carbon sequestration capacity, assessing forest carbon balance, and revealing changes in the structure and function of forest ecosystems. When AFB is measured at the species level using widely available remote sensing data, regional changes in forest composition can readily be monitored. In this study, wall-to-wall maps of species-level AFB were generated for forests in Northeast China by integrating forest inventory data with Moderate Resolution Imaging Spectroradiometer (MODIS) images and environmental variables through applying the optimal k-nearest neighbor (kNN) imputation model. By comparing the prediction accuracy of 630 kNN models, we found that the models with random forest (RF) as the distance metric showed the highest accuracy. Compared to the use of single-month MODIS data for September, there was no appreciable improvement for the estimation accuracy of species-level AFB by using multi-month MODIS data. When k > 7, the accuracy improvement of the RF-based kNN models using the single MODIS predictors for September was essentially negligible. Therefore, the kNN model using the RF distance metric, single-month (September) MODIS predictors and k = 7 was the optimal model to impute the species-level AFB for entire Northeast China. Our imputation results showed that average AFB of all species over Northeast China was 101.98 Mg/ha around 2000. Among 17 widespread species, larch was most dominant, with the largest AFB (20.88 Mg/ha), followed by white birch (13.84 Mg/ha). Amur corktree and willow had low AFB (0.91 and 0.96 Mg/ha, respectively). Environmental variables (e.g., climate and topography) had strong relationships with species-level AFB. By integrating forest inventory data and remote sensing data with complete spatial coverage using the optimal kNN model, we successfully mapped the AFB distribution of the 17 tree species over Northeast China. We also evaluated the accuracy of AFB at different spatial scales. The AFB estimation accuracy significantly improved from stand level up to the ecotype level, indicating that the AFB maps generated from this study are more suitable to apply to forest ecosystem models (e.g., LINKAGES) which require species-level attributes at the ecotype scale.

  15. KNN Model (Job Title)

    • kaggle.com
    zip
    Updated Mar 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abu Noman Md. Sakib (2023). KNN Model (Job Title) [Dataset]. https://www.kaggle.com/datasets/anmspro/knn-model-job-title/discussion
    Explore at:
    zip(1409669 bytes)Available download formats
    Dataset updated
    Mar 28, 2023
    Authors
    Abu Noman Md. Sakib
    Description

    Dataset

    This dataset was created by Abu Noman Md. Sakib

    Contents

  16. f

    Few-shot learning classification accuracy on 14 medical datasets with the...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cai, Xiaohao; Liu, Jiahui; Fan, Keqiang; Niranjan, Mahesan (2024). Few-shot learning classification accuracy on 14 medical datasets with the KNN classifier. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001424309
    Explore at:
    Dataset updated
    Nov 6, 2024
    Authors
    Cai, Xiaohao; Liu, Jiahui; Fan, Keqiang; Niranjan, Mahesan
    Description

    Few-shot learning classification accuracy on 14 medical datasets with the KNN classifier.

  17. Dataset For kNN

    • kaggle.com
    zip
    Updated Dec 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mit Gandhi (2024). Dataset For kNN [Dataset]. https://www.kaggle.com/datasets/mitgandhi10/dataset-for-knn
    Explore at:
    zip(4856 bytes)Available download formats
    Dataset updated
    Dec 25, 2024
    Authors
    Mit Gandhi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains information about car evaluations based on various features like buying price, maintenance cost, number of doors, capacity of persons, luggage boot size, and safety rating. It also includes the class label indicating the overall evaluation of the car.

    The dataset is ideal for machine learning classification tasks, especially for understanding the impact of categorical and numerical features on car evaluations.

    Key Points:

    Total Rows: 1,728 (Example, replace with actual row count) Total Columns: 7 Categorical Features: buying, maint, lug_boot, safety, class Numerical Features: doors, persons Objective: Predict the class of a car based on the given features.

  18. Hyperparameter settings of BERT model.

    • plos.figshare.com
    xls
    Updated Oct 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu Xiao; Qiaoxing Li; Qian Ma; Jiasheng Shen; Yong Yang; Danyang Li (2024). Hyperparameter settings of BERT model. [Dataset]. http://doi.org/10.1371/journal.pone.0305095.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lu Xiao; Qiaoxing Li; Qian Ma; Jiasheng Shen; Yong Yang; Danyang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Text classification, as an important research area of text mining, can quickly and effectively extract valuable information to address the challenges of organizing and managing large-scale text data in the era of big data. Currently, the related research on text classification tends to focus on the application in fields such as information filtering, information retrieval, public opinion monitoring, and library and information, with few studies applying text classification methods to the field of tourist attractions. In light of this, a corpus of tourist attraction description texts is constructed using web crawler technology in this paper. We propose a novel text representation method that combines Word2Vec word embeddings with TF-IDF-CRF-POS weighting, optimizing traditional TF-IDF by incorporating total relative term frequency, category discriminability, and part-of-speech information. Subsequently, the proposed algorithm respectively combines seven commonly used classifiers (DT, SVM, LR, NB, MLP, RF, and KNN), known for their good performance, to achieve multi-class text classification for six subcategories of national A-level tourist attractions. The effectiveness and superiority of this algorithm are validated by comparing the overall performance, specific category performance, and model stability against several commonly used text representation methods. The results demonstrate that the newly proposed algorithm achieves higher accuracy and F1-measure on this type of professional dataset, and even outperforms the high-performance BERT classification model currently favored by the industry. Acc, marco-F1, and mirco-F1 values are respectively 2.29%, 5.55%, and 2.90% higher. Moreover, the algorithm can identify rare categories in the imbalanced dataset and exhibit better stability across datasets of different sizes. Overall, the algorithm presented in this paper exhibits superior classification performance and robustness. In addition, the conclusions obtained by the predicted value and the true value are consistent, indicating that this algorithm is practical. The professional domain text dataset used in this paper poses higher challenges due to its complexity (uneven text length, relatively imbalanced categories), and a high degree of similarity between categories. However, this proposed algorithm can efficiently implement the classification of multiple subcategories of this type of text set, which is a beneficial exploration of the application research of complex Chinese text datasets in specific fields, and provides a useful reference for the vector expression and classification of text datasets with similar content.

  19. f

    Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with...

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seoung Bum Kim; Jung Woo Lee; Sin Young Kim; Deok Won Lee (2023). Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with different numbers of features. [Dataset]. http://doi.org/10.1371/journal.pone.0067862.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Seoung Bum Kim; Jung Woo Lee; Sin Young Kim; Deok Won Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average standard errors from 1,000 experiments are shown inside the parentheses; boldface values indicate in each dataset the KNN models with minimum error rates.

  20. h

    P-KNN

    • huggingface.co
    Updated Aug 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brandes Lab @ NYU (2025). P-KNN [Dataset]. https://huggingface.co/datasets/brandeslab/P-KNN
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset authored and provided by
    Brandes Lab @ NYU
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    P-KNN Precomputed Scores Dataset

    This dataset provides precomputed pathogenicity prediction scores generated by the P-KNN method using dbNSFP v5.2 (academic or commercial version) with joint calibration.It contains pathogenicity assessments for all missense variants, organized into multiple subfolders.

      Dataset Structure
    
    
    
    
    
      1. precomputed_score_academic_chromosome
    

    Includes precomputed scores derived from the academic version of dbNSFP v5.2a, organized by genomic… See the full description on the dataset page: https://huggingface.co/datasets/brandeslab/P-KNN.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Piyush Borhade (2023). Knn algorithms [Dataset]. https://www.kaggle.com/datasets/piyushborhade/knn-algorithms
Organization logo

Knn algorithms

Great csv for Beginners who want to practice KNN algorithm

Explore at:
zip(1752 bytes)Available download formats
Dataset updated
May 12, 2023
Authors
Piyush Borhade
Description

KNN Algorithm is used to find the class of point by the class of nearest neighbour.

KNN Algorithm can be used for both classification as well as Regression! but here we will be using to solve Classification problem.

Here, in the dataset, We are having 4 features which are Gender, Age, Salary, Purchase Iphone.

Search
Clear search
Close search
Google apps
Main menu