100+ datasets found
  1. Kaggle's Most Used Packages & Method Calls

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
    Explore at:
    zip(2405388375 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    TheItCrow
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

    Most Imported R PackagesMost Imported Python Packages
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


    We perform this extraction using the following three regex patterns:

    PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
    PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
    R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')
    

    This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

  2. Code4ML 2.0

    • zenodo.org
    csv, txt
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonimous authors; Anonimous authors (2025). Code4ML 2.0 [Dataset]. http://doi.org/10.5281/zenodo.15465737
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonimous authors; Anonimous authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an enriched version of the Code4ML dataset, a large-scale corpus of annotated Python code snippets, competition summaries, and data descriptions sourced from Kaggle. The initial release includes approximately 2.5 million snippets of machine learning code extracted from around 100,000 Jupyter notebooks. A portion of these snippets has been manually annotated by human assessors through a custom-built, user-friendly interface designed for this task.

    The original dataset is organized into multiple CSV files, each containing structured data on different entities:

    • code_blocks.csv: Contains raw code snippets extracted from Kaggle.
    • kernels_meta.csv: Metadata for the notebooks (kernels) from which the code snippets were derived.
    • competitions_meta.csv: Metadata describing Kaggle competitions, including information about tasks and data.
    • markup_data.csv: Annotated code blocks with semantic types, allowing deeper analysis of code structure.
    • vertices.csv: A mapping from numeric IDs to semantic types and subclasses, used to interpret annotated code blocks.

    Table 1. code_blocks.csv structure

    ColumnDescription
    code_blocks_indexGlobal index linking code blocks to markup_data.csv.
    kernel_idIdentifier for the Kaggle Jupyter notebook from which the code block was extracted.
    code_block_id

    Position of the code block within the notebook.

    code_block

    The actual machine learning code snippet.

    Table 2. kernels_meta.csv structure

    ColumnDescription
    kernel_idIdentifier for the Kaggle Jupyter notebook.
    kaggle_scorePerformance metric of the notebook.
    kaggle_commentsNumber of comments on the notebook.
    kaggle_upvotesNumber of upvotes the notebook received.
    kernel_linkURL to the notebook.
    comp_nameName of the associated Kaggle competition.

    Table 3. competitions_meta.csv structure

    ColumnDescription
    comp_nameName of the Kaggle competition.
    descriptionOverview of the competition task.
    data_typeType of data used in the competition.
    comp_typeClassification of the competition.
    subtitleShort description of the task.
    EvaluationAlgorithmAbbreviationMetric used for assessing competition submissions.
    data_sourcesLinks to datasets used.
    metric typeClass label for the assessment metric.

    Table 4. markup_data.csv structure

    ColumnDescription
    code_blockMachine learning code block.
    too_longFlag indicating whether the block spans multiple semantic types.
    marksConfidence level of the annotation.
    graph_vertex_idID of the semantic type.

    The dataset allows mapping between these tables. For example:

    • code_blocks.csv can be linked to kernels_meta.csv via the kernel_id column.
    • kernels_meta.csv is connected to competitions_meta.csv through comp_name. To maintain quality, kernels_meta.csv includes only notebooks with available Kaggle scores.

    In addition, data_with_preds.csv contains automatically classified code blocks, with a mapping back to code_blocks.csvvia the code_blocks_index column.

    Code4ML 2.0 Enhancements

    The updated Code4ML 2.0 corpus introduces kernels extracted from Meta Kaggle Code. These kernels correspond to the kaggle competitions launched since 2020. The natural descriptions of the competitions are retrieved with the aim of LLM.

    Notebooks in kernels_meta2.csv may not have a Kaggle score but include a leaderboard ranking (rank), providing additional context for evaluation.

    competitions_meta_2.csv is enriched with data_cards, decsribing the data used in the competitions.

    Applications

    The Code4ML 2.0 corpus is a versatile resource, enabling training and evaluation of models in areas such as:

    • Code generation
    • Code understanding
    • Natural language processing of code-related tasks
  3. Kaggle Survey 2022 kernel stats

    • kaggle.com
    zip
    Updated Nov 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Ryzhkov (2022). Kaggle Survey 2022 kernel stats [Dataset]. https://www.kaggle.com/datasets/alexryzhkov/kaggle-survey-2022-kernel-stats
    Explore at:
    zip(4854 bytes)Available download formats
    Dataset updated
    Nov 12, 2022
    Authors
    Alexander Ryzhkov
    Description

    Dataset

    This dataset was created by Alexander Ryzhkov

    Contents

  4. Manoeuvring Kaggle Kernel and Data Environment

    • kaggle.com
    zip
    Updated Aug 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Regi (2018). Manoeuvring Kaggle Kernel and Data Environment [Dataset]. https://www.kaggle.com/regivm/kernel
    Explore at:
    zip(7410 bytes)Available download formats
    Dataset updated
    Aug 30, 2018
    Authors
    Regi
    Description

    Dataset

    This dataset was created by Regi

    Contents

  5. Kaggles' top Kernels and Datasets

    • kaggle.com
    zip
    Updated Apr 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liam Larsen (2017). Kaggles' top Kernels and Datasets [Dataset]. https://www.kaggle.com/datasets/kingburrito666/kaggles-top-kernels-and-datasets
    Explore at:
    zip(10729 bytes)Available download formats
    Dataset updated
    Apr 17, 2017
    Authors
    Liam Larsen
    Description

    Context

    The reason I did this was because I wanted to know if there was a correlation between Kaggles' top Kernels and Datasets with its popularity. (wanted to know how to get tops, lol). I scrapped the data using DataMiner

    Content

    top-kernels has:

    • Name. name
    • Upvotes. upvotes
    • Language. language used
    • Comments. amount comments
    • IsNotebook. is it a notebook or a script
    • visuals. how many visuals?

    top-datasets has:

    • Name. name
    • upvotes. number of upvotes
    • downloads. number of downloads
    • comments. number of comments
    • author. author name
    • updated. last updated
    • description. dataset description

    Go kaggle team

  6. ImagesForKernel

    • kaggle.com
    zip
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deeplearner (2020). ImagesForKernel [Dataset]. https://www.kaggle.com/adarshpathak/imagesforkernel
    Explore at:
    zip(1424175 bytes)Available download formats
    Dataset updated
    Jul 9, 2020
    Authors
    deeplearner
    Description

    Dataset

    This dataset was created by deeplearner

    Contents

  7. OS Kernel Anomaly Dataset

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). OS Kernel Anomaly Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/os-kernel-anomaly-dataset
    Explore at:
    zip(15689 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed to support research in anomaly detection for OS kernels, particularly in the context of power monitoring systems used in embedded environments. It simulates the interaction between system-level operations and power consumption behaviors, providing a rich set of features for training and evaluating hybrid models.

    The dataset contains 1,000 records of yet realistic system behavior, including:

    System call sequences

    Power usage logs (in watts)

    CPU and memory utilization

    Process identifiers and names

    Timestamps

    Labeled entries (Normal or Anomaly)

    Anomalies are injected using fuzzy testing principles to simulate abnormal power spikes, syscall irregularities, or excessive resource usage, mimicking real-world kernel faults or malicious activity. This dataset enables the development of robust models that can learn complex, uncertain system behavior patterns for enhanced security and stability of embedded power monitoring applications.

  8. Kaggle Survey Challenge - All Kernels

    • kaggle.com
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KlemenVodopivec (2022). Kaggle Survey Challenge - All Kernels [Dataset]. https://www.kaggle.com/datasets/klemenvodopivec/kaggle-survey-challenge-all-kernels/data
    Explore at:
    zip(206438 bytes)Available download formats
    Dataset updated
    Nov 22, 2022
    Authors
    KlemenVodopivec
    Description

    Collections of kernels submissions for the Kaggle survey competitions from 2017 to 2022. As this data was collected during the 2022 survey competition, it does not contain all the kernels for year 2022 .

  9. Tensorflow's Global and Operation level seeds

    • kaggle.com
    zip
    Updated May 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Ahire (2023). Tensorflow's Global and Operation level seeds [Dataset]. https://www.kaggle.com/datasets/adeepak7/tensorflow-global-and-operation-level-seeds
    Explore at:
    zip(2984 bytes)Available download formats
    Dataset updated
    May 20, 2023
    Authors
    Deepak Ahire
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This dataset contains the python files containing snippets required for the Kaggle kernel - https://www.kaggle.com/code/adeepak7/tensorflow-s-global-and-operation-level-seeds/

    Since the kernel is around setting/re-setting global and local level seeds, the nullification of the effect of these seeds in the subsequent cells wasn't possible. Hence, the snippets have been provided as separate python files and these python files are executed independently in the separate cells.

  10. Clean Meta Kaggle

    • kaggle.com
    Updated Sep 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoni Kremer (2023). Clean Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/yonikremer/clean-meta-kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yoni Kremer
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Cleaned Meta-Kaggle Dataset

    The Original Dataset - Meta-Kaggle

    Explore our public data on competitions, datasets, kernels (code / notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

    https://i.imgur.com/2Egeb8R.png" alt="" title="a title">

    This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

    Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

    August 2023 update

    In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here

    We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

    The Problems with the Original Dataset

    • The original dataset is 32 CSV files, with 268 colums and 7GB of compressed data. Having so many tables and columns makes it hard to understand the data.
    • The data is not normalized, so when you join tables you get a lot of errors.
    • Some values refer to non-existing values in other tables. For example, the UserId column in the ForumMessages table has values that do not exist in the Users table.
    • There are missing values.
    • There are duplicate values.
    • There are values that are not valid. For example, Ids that are not positive integers.
    • The date and time columns are not in the right format.
    • Some columns only have the same value for all rows, so they are not useful.
    • The boolean columns have string values True or False.
    • Incorrect values for the Total columns. For example, the DatasetCount is not the total number of datasets with the Tag according to the DatasetTags table.
    • Users upvote their own messages.

    The Solution

    • To handle so many tables and columns I use a relational database. I use MySQL, but you can use any relational database.
    • The steps to create the database are:
    • Creating the database tables with the right data types and constraints. I do that by running the db_abd_create_tables.sql script.
    • Downloading the CSV files from Kaggle using the Kaggle API.
    • Cleaning the data using pandas. I do that by running the clean_data.py script. The script does the following steps for each table:
      • Drops the columns that are not needed.
      • Converts each column to the right data type.
      • Replaces foreign keys that do not exist with NULL.
      • Replaces some of the missing values with default values.
      • Removes rows where there are missing values in the primary key/not null columns.
      • Removes duplicate rows.
    • Loading the data into the database using the LOAD DATA INFILE command.
    • Checks that the number of rows in the database tables is the same as the number of rows in the CSV files.
    • Adds foreign key constraints to the database tables. I do that by running the add_foreign_keys.sql script.
    • Update the Total columns in the database tables. I do that by running the update_totals.sql script.
    • Backup the database.
  11. book kaggle

    • kaggle.com
    zip
    Updated Feb 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Bharathi (2020). book kaggle [Dataset]. https://www.kaggle.com/ravibharathii/book-kaggle
    Explore at:
    zip(5615 bytes)Available download formats
    Dataset updated
    Feb 27, 2020
    Authors
    Ravi Bharathi
    Description

    Dataset

    This dataset was created by Ravi Bharathi

    Released under Data files Β© Original Authors

    Contents

  12. kaggle survey historical meta

    • kaggle.com
    zip
    Updated Nov 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jak kajdsfs (2022). kaggle survey historical meta [Dataset]. https://www.kaggle.com/datasets/jakkajdsfs/kaggle-survey-historical-meta
    Explore at:
    zip(1527614294 bytes)Available download formats
    Dataset updated
    Nov 22, 2022
    Authors
    jak kajdsfs
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12065493%2Fefa5252a24bf0bdd393156ad5778ed02%2Fkernel_dataclass.png?generation=1669324775217833&alt=media" alt="">

    All in all, in the years 2017-2022 1822 kernels used the Kaggle Survey datasets. We have ordered our data into several distinct datasets, each of which was useful in obtaining answers to our questions on at least one of the topics. The obtained datasets are briefly overviewed below.

    • notebooks.zip

      Contains 1822 raw notebooks saved as either ipynb or Rmd. 58 notebooks could not be executed neither in Python nor in R, so they were given the extension unknown_format.txt. The name of each file is the notebook_id as listed on kaggle.com and matches notebook_id in the file all_kernels.csv, which is described below. Among other things, this dataset was used to obtain a per-notebook list of imported libraries, as well as the questions that were addressed by each notebook.

    • all_kernels.csv

      Each row of this dataset contains data about one of the 1822 kernels. The columns correspond to all the fields listed in the Kernel class image above. A more detailed overview of the columns can be found on the dataset's Kaggle page. # TODO

    • cleaned_kernels.csv

      This is in effect the main dataset we used in our competition notebook. We took all_kernels.csv and removed from it 233 rows which described kernels which were just unchanged forks of other kernels.

    • all_questions.json

      Contains all Kaggle Survey questions from the years 2017-2022. In the year 2017, the survey questions were unnumbered, so we numbered them ourselves, keeping the original order and using zero-based indexing. Surveys 2018-2022 have numbered questions, so the index was taken unchanged.

    • question_map.csv

      Looking at survey questions over several years, one can note that certain questions repeat. For example, every year's survey contains a question What is your age. All such repetitions are captured in this dataset. For each unique question, the question number and the survey year where this question appears is given. The question numbers are described in the preceding paragraph sorted_questions_all.json. Certain questions are worded differently but functionally identical. If such questions were joined, a note was added, to alert other users of this dataset.

  13. Kaggle Dataset Metadata Repository

    • kaggle.com
    zip
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ijaj Ahmed (2024). Kaggle Dataset Metadata Repository [Dataset]. https://www.kaggle.com/datasets/ijajdatanerd/kaggle-dataset-metadata-repository
    Explore at:
    zip(5122110 bytes)Available download formats
    Dataset updated
    Nov 16, 2024
    Authors
    Ijaj Ahmed
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13367141%2F444a868e669671faf9007822d6f2d348%2FAdd%20a%20heading.png?generation=1731775788329917&alt=media" alt="">

    Kaggle Dataset Metadata Collection πŸ“Š

    This dataset provides comprehensive metadata on various Kaggle datasets, offering detailed information about the dataset owners, creators, usage statistics, licensing, and more. It can help researchers, data scientists, and Kaggle enthusiasts quickly analyze the key attributes of different datasets on Kaggle. πŸ“š

    Dataset Overview:

    • Purpose: To provide detailed insights into Kaggle dataset metadata.
    • Content: Information related to the dataset's owner, creator, usage metrics, licensing, and more.
    • Target Audience: Data scientists, Kaggle competitors, and dataset curators.

    Columns Description πŸ“‹

    • datasetUrl 🌐: The URL of the Kaggle dataset page. This directs you to the specific dataset's page on Kaggle.

    • ownerAvatarUrl πŸ–ΌοΈ: The URL of the dataset owner's profile avatar on Kaggle.

    • ownerName πŸ‘€: The name of the dataset owner. This can be the individual or organization that created and maintains the dataset.

    • ownerUrl 🌍: A link to the Kaggle profile page of the dataset owner.

    • ownerUserId πŸ’Ό: The unique user ID of the dataset owner on Kaggle.

    • ownerTier πŸŽ–οΈ: The ownership tier, such as "Tier 1" or "Tier 2," indicating the owner's status or level on Kaggle.

    • creatorName πŸ‘©β€πŸ’»: The name of the dataset creator, which could be different from the owner.

    • creatorUrl 🌍: A link to the Kaggle profile page of the dataset creator.

    • creatorUserId πŸ’Ό: The unique user ID of the dataset creator.

    • scriptCount πŸ“œ: The number of scripts (kernels) associated with this dataset.

    • scriptsUrl πŸ”—: A link to the scripts (kernels) page for the dataset, where you can explore related code.

    • forumUrl πŸ’¬: The URL to the discussion forum for this dataset, where users can ask questions and share insights.

    • viewCount πŸ‘€: The number of views the dataset page has received on Kaggle.

    • downloadCount ⬇️: The number of times the dataset has been downloaded by users.

    • dateCreated πŸ“…: The date when the dataset was first created and uploaded to Kaggle.

    • dateUpdated πŸ”„: The date when the dataset was last updated or modified.

    • voteButton πŸ‘: The metadata for the dataset's vote button, showing how users interact with the dataset's quality ratings.

    • categories 🏷️: The categories or tags associated with the dataset, helping users filter datasets based on topics of interest (e.g., "Healthcare," "Finance").

    • licenseName πŸ›‘οΈ: The name of the license under which the dataset is shared (e.g., "CC0," "MIT License").

    • licenseShortName πŸ”‘: A short form or abbreviation of the dataset's license name (e.g., "CC0" for Creative Commons Zero).

    • datasetSize πŸ“¦: The size of the dataset in terms of storage, typically measured in MB or GB.

    • commonFileTypes πŸ“‚: A list of common file types included in the dataset (e.g., .csv, .json, .xlsx).

    • downloadUrl ⬇️: A direct link to download the dataset files.

    • newKernelNotebookUrl πŸ“: A link to a new kernel or notebook related to this dataset, for those who wish to explore it programmatically.

    • newKernelScriptUrl πŸ’»: A link to a new script for running computations or processing data related to the dataset.

    • usabilityRating 🌟: A rating or score representing how usable the dataset is, based on user feedback.

    • firestorePath πŸ”: A reference to the path in Firestore where this dataset’s metadata is stored.

    • datasetSlug 🏷️: A URL-friendly version of the dataset name, typically used for URLs.

    • rank πŸ“ˆ: The dataset's rank based on certain metrics (e.g., downloads, votes, views).

    • datasource 🌐: The source or origin of the dataset (e.g., government data, private organizations).

    • medalUrl πŸ…: A URL pointing to the dataset's medal or badge, indicating the dataset's quality or relevance.

    • hasHashLink πŸ”—: Indicates whether the dataset has a hash link for verifying data integrity.

    • ownerOrganizationId 🏒: The unique organization ID of the dataset's owner if the owner is an organization rather than an individual.

    • totalVotes πŸ—³οΈ: The total number of votes the dataset has received from users, reflecting its popularity or quality.

    • category_names πŸ“‘: A comma-separated string of category names that represent the dataset’s classification.

    This dataset is a valuable resource for those who want to analyze Kaggle's ecosystem, discover high-quality datasets, and explore metadata in a structured way. πŸŒπŸ“Š

  14. No Data Sources

    • kaggle.com
    zip
    Updated Apr 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2017). No Data Sources [Dataset]. https://www.kaggle.com/kaggle/no-data-sources
    Explore at:
    zip(159 bytes)Available download formats
    Dataset updated
    Apr 12, 2017
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    Description

    This isn't a dataset, it is a collection of kernels written on Kaggle that use no data at all.

  15. Stacknet

    • kaggle.com
    zip
    Updated Mar 15, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    akKaggle (2019). Stacknet [Dataset]. https://www.kaggle.com/datasets/akkaggle2018/stacknet
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 15, 2019
    Authors
    akKaggle
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    Kaggle kernels doesn't have the package, pystacknet, so we created a dataset containing it for the Petfinder Competition

    Acknowledgements

    Code from: https://github.com/h2oai/pystacknet

    @bkkaggle (https://www.kaggle.com/bkkaggle) helped with creating the dataset

  16. Wheat Variety Classification

    • kaggle.com
    zip
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudhanshu Rastogi (2022). Wheat Variety Classification [Dataset]. https://www.kaggle.com/datasets/sudhanshu2198/wheat-variety-classification
    Explore at:
    zip(3877 bytes)Available download formats
    Dataset updated
    Nov 23, 2022
    Authors
    Sudhanshu Rastogi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data Set Information:

    The dataset comprised wheat kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each. The data set can be used for the tasks of classification and cluster analysis.All of these parameters were real-valued continuous

    Attribute Information:

    To construct the data, seven geometric parameters of wheat kernels were measured:

    1. area A,
    2. perimeter P,
    3. compactness C = 4*pi*A/P^2,
    4. length of kernel,
    5. width of kernel,
    6. asymmetry coefficient
    7. length of kernel groove.
  17. kaggle-severstal-kernel

    • kaggle.com
    zip
    Updated Oct 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maksim Filin (2019). kaggle-severstal-kernel [Dataset]. https://www.kaggle.com/xsardas/kaggleseverstalkernel
    Explore at:
    zip(7073 bytes)Available download formats
    Dataset updated
    Oct 24, 2019
    Authors
    Maksim Filin
    Description

    Dataset

    This dataset was created by Maksim Filin

    Contents

  18. PlaygroundS4E04|OriginalData

    • kaggle.com
    zip
    Updated Apr 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Ramakrishnan (2024). PlaygroundS4E04|OriginalData [Dataset]. https://www.kaggle.com/datasets/ravi20076/playgrounds4e04originaldata
    Explore at:
    zip(67811 bytes)Available download formats
    Dataset updated
    Apr 1, 2024
    Authors
    Ravi Ramakrishnan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is created using the below reference-
    https://archive.ics.uci.edu/dataset/1/abalone
    We import the corresponding repository in a Kaggle kernel and populate the dataset thereby. Users may choose to import the corresponding dataset with a simple read_csv in pandas and proceed with the solution.

    Best wishes!

  19. progresbar2-local

    • kaggle.com
    zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Chae (2021). progresbar2-local [Dataset]. https://www.kaggle.com/justinchae/progresbar2local
    Explore at:
    zip(47743 bytes)Available download formats
    Dataset updated
    Jun 7, 2021
    Authors
    Justin Chae
    Description

    Dataset

    This dataset was created by Justin Chae

    Contents

  20. all_kernels_cleaned

    • kaggle.com
    zip
    Updated Nov 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KlemenVodopivec (2022). all_kernels_cleaned [Dataset]. https://www.kaggle.com/datasets/klemenvodopivec/all-kernels-cleaned
    Explore at:
    zip(79146 bytes)Available download formats
    Dataset updated
    Nov 16, 2022
    Authors
    KlemenVodopivec
    Description

    Dataset

    This dataset was created by KlemenVodopivec

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
Organization logo

Kaggle's Most Used Packages & Method Calls

Kaggle-Meta Enriched With Imports & Method Calls

Explore at:
zip(2405388375 bytes)Available download formats
Dataset updated
Jun 13, 2025
Authors
TheItCrow
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

Most Imported R PackagesMost Imported Python Packages
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


We perform this extraction using the following three regex patterns:

PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')

This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

Search
Clear search
Close search
Google apps
Main menu