100+ datasets found
  1. Manoeuvring Kaggle Kernel and Data Environment

    • kaggle.com
    zip
    Updated Aug 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Regi (2018). Manoeuvring Kaggle Kernel and Data Environment [Dataset]. https://www.kaggle.com/regivm/kernel
    Explore at:
    zip(7410 bytes)Available download formats
    Dataset updated
    Aug 30, 2018
    Authors
    Regi
    Description

    Dataset

    This dataset was created by Regi

    Contents

  2. Kaggle's Most Used Packages & Method Calls

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
    Explore at:
    zip(2405388375 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    TheItCrow
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

    Most Imported R PackagesMost Imported Python Packages
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


    We perform this extraction using the following three regex patterns:

    PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
    PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
    R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')
    

    This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

  3. Kaggle Survey 2022 kernel stats

    • kaggle.com
    zip
    Updated Nov 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Ryzhkov (2022). Kaggle Survey 2022 kernel stats [Dataset]. https://www.kaggle.com/datasets/alexryzhkov/kaggle-survey-2022-kernel-stats
    Explore at:
    zip(4854 bytes)Available download formats
    Dataset updated
    Nov 12, 2022
    Authors
    Alexander Ryzhkov
    Description

    Dataset

    This dataset was created by Alexander Ryzhkov

    Contents

  4. Tensorflow's Global and Operation level seeds

    • kaggle.com
    zip
    Updated May 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Ahire (2023). Tensorflow's Global and Operation level seeds [Dataset]. https://www.kaggle.com/datasets/adeepak7/tensorflow-global-and-operation-level-seeds
    Explore at:
    zip(2984 bytes)Available download formats
    Dataset updated
    May 20, 2023
    Authors
    Deepak Ahire
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset contains the python files containing snippets required for the Kaggle kernel - https://www.kaggle.com/code/adeepak7/tensorflow-s-global-and-operation-level-seeds/

    Since the kernel is around setting/re-setting global and local level seeds, the nullification of the effect of these seeds in the subsequent cells wasn't possible. Hence, the snippets have been provided as separate python files and these python files are executed independently in the separate cells.

  5. OS Kernel Anomaly Dataset

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). OS Kernel Anomaly Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/os-kernel-anomaly-dataset
    Explore at:
    zip(15689 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed to support research in anomaly detection for OS kernels, particularly in the context of power monitoring systems used in embedded environments. It simulates the interaction between system-level operations and power consumption behaviors, providing a rich set of features for training and evaluating hybrid models.

    The dataset contains 1,000 records of yet realistic system behavior, including:

    System call sequences

    Power usage logs (in watts)

    CPU and memory utilization

    Process identifiers and names

    Timestamps

    Labeled entries (Normal or Anomaly)

    Anomalies are injected using fuzzy testing principles to simulate abnormal power spikes, syscall irregularities, or excessive resource usage, mimicking real-world kernel faults or malicious activity. This dataset enables the development of robust models that can learn complex, uncertain system behavior patterns for enhanced security and stability of embedded power monitoring applications.

  6. Kaggles' top Kernels and Datasets

    • kaggle.com
    zip
    Updated Apr 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liam Larsen (2017). Kaggles' top Kernels and Datasets [Dataset]. https://www.kaggle.com/datasets/kingburrito666/kaggles-top-kernels-and-datasets
    Explore at:
    zip(10729 bytes)Available download formats
    Dataset updated
    Apr 17, 2017
    Authors
    Liam Larsen
    Description

    Context

    The reason I did this was because I wanted to know if there was a correlation between Kaggles' top Kernels and Datasets with its popularity. (wanted to know how to get tops, lol). I scrapped the data using DataMiner

    Content

    top-kernels has:

    • Name. name
    • Upvotes. upvotes
    • Language. language used
    • Comments. amount comments
    • IsNotebook. is it a notebook or a script
    • visuals. how many visuals?

    top-datasets has:

    • Name. name
    • upvotes. number of upvotes
    • downloads. number of downloads
    • comments. number of comments
    • author. author name
    • updated. last updated
    • description. dataset description

    Go kaggle team

  7. Public Kernel

    • kaggle.com
    zip
    Updated Jan 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salil Gautam (2018). Public Kernel [Dataset]. https://www.kaggle.com/datasets/salil007/public-kernel
    Explore at:
    zip(30472154 bytes)Available download formats
    Dataset updated
    Jan 31, 2018
    Authors
    Salil Gautam
    Description

    Dataset

    This dataset was created by Salil Gautam

    Contents

  8. kaggle-severstal-kernel

    • kaggle.com
    zip
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maksim Filin (2019). kaggle-severstal-kernel [Dataset]. https://www.kaggle.com/xsardas/kaggleseverstalkernel
    Explore at:
    zip(7073 bytes)Available download formats
    Dataset updated
    Oct 24, 2019
    Authors
    Maksim Filin
    Description

    Dataset

    This dataset was created by Maksim Filin

    Contents

  9. Wheat Variety Classification

    • kaggle.com
    zip
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudhanshu Rastogi (2022). Wheat Variety Classification [Dataset]. https://www.kaggle.com/datasets/sudhanshu2198/wheat-variety-classification
    Explore at:
    zip(3877 bytes)Available download formats
    Dataset updated
    Nov 23, 2022
    Authors
    Sudhanshu Rastogi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Set Information:

    The dataset comprised wheat kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each. The data set can be used for the tasks of classification and cluster analysis.All of these parameters were real-valued continuous

    Attribute Information:

    To construct the data, seven geometric parameters of wheat kernels were measured:

    1. area A,
    2. perimeter P,
    3. compactness C = 4*pi*A/P^2,
    4. length of kernel,
    5. width of kernel,
    6. asymmetry coefficient
    7. length of kernel groove.
  10. spacy 3.0 with english models

    • kaggle.com
    zip
    Updated Apr 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautier (2021). spacy 3.0 with english models [Dataset]. https://www.kaggle.com/brotye/spacy3
    Explore at:
    zip(1324094980 bytes)Available download formats
    Dataset updated
    Apr 27, 2021
    Authors
    Gautier
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Gautier

    Released under CC0: Public Domain

    Contents

  11. test_data

    • kaggle.com
    zip
    Updated Oct 28, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    joeland209 (2016). test_data [Dataset]. https://www.kaggle.com/joeland209/test-data
    Explore at:
    zip(7164 bytes)Available download formats
    Dataset updated
    Oct 28, 2016
    Authors
    joeland209
    Description

    Dataset

    This dataset was created by joeland209

    Contents

    haha

  12. ImagesForKernel

    • kaggle.com
    zip
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deeplearner (2020). ImagesForKernel [Dataset]. https://www.kaggle.com/adarshpathak/imagesforkernel
    Explore at:
    zip(1424175 bytes)Available download formats
    Dataset updated
    Jul 9, 2020
    Authors
    deeplearner
    Description

    Dataset

    This dataset was created by deeplearner

    Contents

  13. pre_trained_roberta_base

    • kaggle.com
    zip
    Updated Jun 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Chae (2021). pre_trained_roberta_base [Dataset]. https://www.kaggle.com/datasets/justinchae/pre-trained-roberta-base
    Explore at:
    zip(303411165 bytes)Available download formats
    Dataset updated
    Jun 8, 2021
    Authors
    Justin Chae
    Description

    Dataset

    This dataset was created by Justin Chae

    Contents

  14. data-for-yolo-v3-kernel

    • kaggle.com
    zip
    Updated Nov 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZykoTsai (2022). data-for-yolo-v3-kernel [Dataset]. https://www.kaggle.com/datasets/zykotsai/data-for-yolo-v3-kernel
    Explore at:
    zip(18544663 bytes)Available download formats
    Dataset updated
    Nov 26, 2022
    Authors
    ZykoTsai
    Description

    Dataset

    This dataset was created by ZykoTsai

    Contents

  15. Kernel Files

    • kaggle.com
    zip
    Updated Feb 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavanya Shukla (2020). Kernel Files [Dataset]. https://www.kaggle.com/lavanyashukla01/kernel-files
    Explore at:
    zip(5663370 bytes)Available download formats
    Dataset updated
    Feb 18, 2020
    Authors
    Lavanya Shukla
    Description

    Dataset

    This dataset was created by Lavanya Shukla

    Contents

  16. kaggle survey historical meta

    • kaggle.com
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jak kajdsfs (2022). kaggle survey historical meta [Dataset]. https://www.kaggle.com/datasets/jakkajdsfs/kaggle-survey-historical-meta
    Explore at:
    zip(1527614294 bytes)Available download formats
    Dataset updated
    Nov 22, 2022
    Authors
    jak kajdsfs
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12065493%2Fefa5252a24bf0bdd393156ad5778ed02%2Fkernel_dataclass.png?generation=1669324775217833&alt=media" alt="">

    All in all, in the years 2017-2022 1822 kernels used the Kaggle Survey datasets. We have ordered our data into several distinct datasets, each of which was useful in obtaining answers to our questions on at least one of the topics. The obtained datasets are briefly overviewed below.

    • notebooks.zip

      Contains 1822 raw notebooks saved as either ipynb or Rmd. 58 notebooks could not be executed neither in Python nor in R, so they were given the extension unknown_format.txt. The name of each file is the notebook_id as listed on kaggle.com and matches notebook_id in the file all_kernels.csv, which is described below. Among other things, this dataset was used to obtain a per-notebook list of imported libraries, as well as the questions that were addressed by each notebook.

    • all_kernels.csv

      Each row of this dataset contains data about one of the 1822 kernels. The columns correspond to all the fields listed in the Kernel class image above. A more detailed overview of the columns can be found on the dataset's Kaggle page. # TODO

    • cleaned_kernels.csv

      This is in effect the main dataset we used in our competition notebook. We took all_kernels.csv and removed from it 233 rows which described kernels which were just unchanged forks of other kernels.

    • all_questions.json

      Contains all Kaggle Survey questions from the years 2017-2022. In the year 2017, the survey questions were unnumbered, so we numbered them ourselves, keeping the original order and using zero-based indexing. Surveys 2018-2022 have numbered questions, so the index was taken unchanged.

    • question_map.csv

      Looking at survey questions over several years, one can note that certain questions repeat. For example, every year's survey contains a question What is your age. All such repetitions are captured in this dataset. For each unique question, the question number and the survey year where this question appears is given. The question numbers are described in the preceding paragraph sorted_questions_all.json. Certain questions are worded differently but functionally identical. If such questions were joined, a note was added, to alert other users of this dataset.

  17. kernel

    • kaggle.com
    zip
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abraham Anderson (2021). kernel [Dataset]. https://www.kaggle.com/abrahamanderson/kernel
    Explore at:
    zip(119652 bytes)Available download formats
    Dataset updated
    Jun 9, 2021
    Authors
    Abraham Anderson
    Description

    Dataset

    This dataset was created by Abraham Anderson

    Contents

  18. Mlcourse.ai-2020

    • kaggle.com
    zip
    Updated Oct 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    anas qais (2020). Mlcourse.ai-2020 [Dataset]. https://www.kaggle.com/anasqais/mlcourseai2020
    Explore at:
    zip(15881 bytes)Available download formats
    Dataset updated
    Oct 14, 2020
    Authors
    anas qais
    Description

    Open Machine Learning Course mlcourse.ai is designed to perfectly balance theory and practice; therefore, each topic is followed by an assignment with a deadline in a week. You can also take part in several Kaggle Inclass competitions held during the course and write your own tutorials. The next session launches in September, 2019. For more info go to the mlcourse.ai main page. Outline This is the list of published articles on medium.com (English), habr.com (Russian), and jqr.com (Chinese). See Kernels of this Dataset for the same material in English. 1. Exploratory Data Analysis with Pandas uk ru, cn, Kaggle Kernel 2. Visual Data Analysis with Python uk ru, cn, Kaggle Kernels: part1, part2 3. Classification, Decision Trees and k Nearest Neighbors uk, ru, cn, Kaggle Kernel 4. Linear Classification and Regression uk, ru, cn, Kaggle Kernels: part1, part2, part3, part4, part5 5. Bagging and Random Forest uk, ru, cn, Kaggle Kernels: part1, part2, part3 6. Feature Engineering and Feature Selection uk, ru, cn, Kaggle Kernel 7. Unsupervised Learning: Principal Component Analysis and Clustering uk, ru, cn, Kaggle Kernel 8. Vowpal Wabbit: Learning with Gigabytes of Data uk, ru, cn, Kaggle Kernel 9. Time Series Analysis with Python, part 1 uk, ru, cn. Predicting future with Facebook Prophet, part 2 uk, cn Kaggle Kernels: part1, part2 10. Gradient Boosting uk, ru, cn, Kaggle Kernel Assignments Each topic is followed by an assignment. See demo versions in this Dataset. Solutions will be discussed in the upcoming run of the course. Kaggle competitions 1. Catch Me If You Can: Intruder Detection through Webpage Session Tracking. Kaggle Inclass 2. How good is your Medium article? Kaggle Inclass Rating Throughout the course we are maintaining a student rating. It takes into account credits scored in assignments and Kaggle competitions. Top students (according to the final rating) will be listed on a special Wiki page. Community Discussions between students are held in the #mlcourse_ai channel of the OpenDataScience Slack team. A registration form will be shared prior to the start of the new session Collaboration You can publish Kernels using this Dataset. But please respect others' interests: don't share solutions to assignments and well-performing solutions for Kaggle Inclass competitions. If you notice any typos/errors in course material, please open an Issue or make a pull request in the course repo. The course is free but you can support organizers by making a pledge on Patreon (monthly support) or a one-time payment on Ko-fi

  19. CSV Files Used in My Kernel

    • kaggle.com
    zip
    Updated Jun 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2018). CSV Files Used in My Kernel [Dataset]. https://www.kaggle.com/dschettler8845/csv-files-used-in-my-kernel
    Explore at:
    zip(2439 bytes)Available download formats
    Dataset updated
    Jun 2, 2018
    Authors
    Darien Schettler
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Darien Schettler

    Released under CC0: Public Domain

    Contents

  20. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Regi (2018). Manoeuvring Kaggle Kernel and Data Environment [Dataset]. https://www.kaggle.com/regivm/kernel
Organization logo

Manoeuvring Kaggle Kernel and Data Environment

Manoeuvring Kaggle Kernel and Data Environment

Explore at:
zip(7410 bytes)Available download formats
Dataset updated
Aug 30, 2018
Authors
Regi
Description

Dataset

This dataset was created by Regi

Contents

Search
Clear search
Close search
Google apps
Main menu