58 datasets found
  1. Kaggle's Most Used Packages & Method Calls

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
    Explore at:
    zip(2405388375 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    TheItCrow
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

    Most Imported R PackagesMost Imported Python Packages
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


    We perform this extraction using the following three regex patterns:

    PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
    PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
    R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')
    

    This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

  2. pyspark-package

    • kaggle.com
    zip
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iulian Cozma (2024). pyspark-package [Dataset]. https://www.kaggle.com/datasets/icozma/pyspark-package
    Explore at:
    zip(1586185224 bytes)Available download formats
    Dataset updated
    Sep 26, 2024
    Authors
    Iulian Cozma
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    For installing pyspark when running notebook without Internet:

    (1) Attach the pyspark-package dataset to your notebook.
    (2) Install pyspark with the following code:

    import shutil
    src_path = r"/kaggle/input/pyspark-package/pyspark-latest.tar.gz.mp4"
    dst_path = r"/kaggle/working/pyspark-latest.tar.gz"
    shutil.copy(src_path, dst_path)
    
    !pip install /kaggle/working/pyspark-latest.tar.gz
    

    or for specific version, check if that version is available in dataset, then you can use i.e. for 3.5.0:

    import shutil
    src_path = r"/kaggle/input/pyspark-package/pyspark-3.5.0.tar.gz.mp4"
    dst_path = r"/kaggle/working/pyspark-3.5.0.tar.gz"
    shutil.copy(src_path, dst_path)
    
    !pip install /kaggle/working/pyspark-3.5.0.tar.gz
    

    (3) Then you can use: python import pyspark

  3. R

    Farm Harmful Animals Dataset

    • universe.roboflow.com
    • kaggle.com
    zip
    Updated Nov 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SeniorProject (2024). Farm Harmful Animals Dataset [Dataset]. https://universe.roboflow.com/seniorproject-nz8ra/farm-harmful-animals-dataset/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2024
    Dataset authored and provided by
    SeniorProject
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Wild Boars Bounding Boxes
    Description

    https://www.kaggle.com/datasets/muzammilaliveltech/farm-harmful-animals-dataset

    this dataset is not mine, it was uploaded to Kaggle by MUZAMMIL ALI VELTECH under CC0: Public Domain. This Roboflow project was made as an attempt to use the dataset after having issue trying to import in Jupyter Notebook from Kaggle

  4. Pytorch Models

    • kaggle.com
    zip
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sufian Othman (2025). Pytorch Models [Dataset]. https://www.kaggle.com/datasets/mohdsufianbinothman/pytorch-models/data
    Explore at:
    zip(21493 bytes)Available download formats
    Dataset updated
    May 10, 2025
    Authors
    Sufian Othman
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    ✅ Step 1: Mount to Dataset

    Search for my dataset pytorch-models and add it — this will mount it at:

    /kaggle/input/pytorch-models/

    ✅ Step 2: Check file paths Once mounted, the four files will be available at:

    /kaggle/input/pytorch-models/base_models.py
    /kaggle/input/pytorch-models/ext_base_models.py
    /kaggle/input/pytorch-models/ext_hybrid_models.py
    /kaggle/input/pytorch-models/hybrid_models.py
    

    ✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):

    import shutil
    
    shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/')
    shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')
    

    ✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:

    import base_models
    import ext_base_models
    import ext_hybrid_models
    import hybrid_models
    

    Or, if you only want to import specific classes or functions:

    from base_models import YourModelClass
    from ext_base_models import AnotherModelClass
    

    ✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:

    model = base_models.YourModelClass()
    output = model(input_data)
    
  5. google/flan-t5-large

    • kaggle.com
    zip
    Updated Jul 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    d0rj_ (2023). google/flan-t5-large [Dataset]. https://www.kaggle.com/datasets/d0rj3228/googleflan-t5-large
    Explore at:
    zip(23751646406 bytes)Available download formats
    Dataset updated
    Jul 14, 2023
    Authors
    d0rj_
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Info

    Source repo is google/flan-t5-large.

    Usage

    1. Add dataset to Kaggle notebook;
    2. Import pretrained from folder;
    from transformers import AutoTokenizer, AutoModel
    
    
    model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
    tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
    
    
  6. Pretrained Google's Deplot

    • kaggle.com
    zip
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henry Hoang (2023). Pretrained Google's Deplot [Dataset]. https://www.kaggle.com/datasets/nhatminhhoang/google-deplot-model
    Explore at:
    zip(1049307570 bytes)Available download formats
    Dataset updated
    May 17, 2023
    Authors
    Henry Hoang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Pre-trained weights of Deplot (source: https://huggingface.co/google/deplot) to import directly to the notebook when the internet is off.

    First, run this to install the latest version of the library transformers (reason) !pip install git+https://github.com/huggingface/transformers

    Usage: ``` from transformers import Pix2StructForConditionalGeneration, AutoProcessor

    model = Pix2StructForConditionalGeneration.from_pretrained('/kaggle/input/google-deplot-model') processor = AutoProcessor.from_pretrained('/kaggle/input/google-deplot-model') ```

  7. Book Metadata from BooksToScrape

    • kaggle.com
    zip
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Book Metadata from BooksToScrape [Dataset]. https://www.kaggle.com/datasets/thedevastator/book-metadata-from-bookstoscrape/discussion?sort=undefined
    Explore at:
    zip(557690 bytes)Available download formats
    Dataset updated
    Feb 11, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Book Metadata from BooksToScrape

    Unlocking Hidden Insights in Reading Material

    By [source]

    About this dataset

    This dataset allows readers to unlock hidden insights into contemporary literature and the books that people are choosing to purchase. It provides comprehensive and powerful data related to a web books retailer, books.toscrape.com, featuring 12 columns of crucial book metadata gathered through web scraping methods in November 2020. Researching publications through this information provides a great sense of insight and understanding into the current reading climate: uncovering emerging trends in what people are buying, reading, rating, and loving worldwide. With this dataset at your disposal you can explore book popularity from a commercial standpoint as well as a creative one; examining publishing preferences from authors' points of view across reviews and genres alike. Dive into discovering the secrets behind book selection habits by delving into topics ranging from rating systems for certain works to pricing analysis for publishers- all fuelled by this carefully organised streamline of data at play here today!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    To get started analyzing this dataset with Kaggle notebooks or other tools: - Open up your tool (Kaggle notebook or another tool) that supports reading CSV files
    - Import the dataset.csv file into your chosen program - Explore each column individually to better understand what type of book metadata exists within each category – descriptors such as title, image URLs/links, ratings/number of reviews, description and more can be found here; 5. Once familiarized with each type for metadata for each column provided by this dataset – begin exploring any correlations between them to deepen understanding about trends among different types for books over time – broken down by category; 6 Lastly – use all available resources through 3rd-party packages within your chosen programming language to continue exploring deeper analysis possibilities (e.g., Pandas).

    By following these steps - you are now ready to start exploring powerful literature insights into contemporary reading material standards! Enjoy discovering hidden insights within this book metadata - that may have otherwise gone undiscovered!

    Research Ideas

    • Generating recommendations of books based on popularity, price point and/of rating.
    • Tracking the success of certain authors/publishers in the long term and understanding their audience preferences.
    • Analysing which types of books consumers prefer (genre, age group targeting) over time to provide useful data to new authors to increase their chances of success

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: dataset.csv | Column name | Description | |:----------------------------------------|:---------------------------------------------------| | Logan Kade (Fallen Crest High #5.5) | Title of the book. (String) | | https | Image URL of the book. (String) | | Two | Rating of the book. (Integer) | | Academic | Description Category of the book. (String) | | 7093cf549cd2e7de | Universal Product Code (UPC) of the book. (String) | | Books | Product Type of the book. (String) | | £13.12 | Price Excluding Tax of the book. (Float) | | £13.12.1 | Price Including Tax of the book. (Float) | | £0.00 | Tax Amount of the book. (Float) | | In stock (5 available) | Availability of the book. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  8. openai/clip-vit-base-patch32

    • kaggle.com
    zip
    Updated Jun 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarun Singhal (2024). openai/clip-vit-base-patch32 [Dataset]. https://www.kaggle.com/datasets/singhaltarun/openaiclip-vit-base-patch32/discussion
    Explore at:
    zip(400492841 bytes)Available download formats
    Dataset updated
    Jun 29, 2024
    Authors
    Tarun Singhal
    Description

    Using the OpenAI CLIP Model in Kaggle

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11228484%2F27123889b0ce5c7a326965dfc9c29f00%2Foverview-a.svg?generation=1719898030106027&alt=media" alt="Overview A">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11228484%2Faa6a6110c2a9a63690339c6a6bfe7ab6%2Foverview-b.svg?generation=1719898209012848&alt=media" alt="Overview B">

    Introduction

    The CLIP (Contrastive Language–Image Pre-training) model is an innovative approach developed by OpenAI, designed to enhance the robustness of computer vision tasks. It leverages a unique training regimen that aligns images with textual descriptions using a contrastive loss, enabling it to perform image classification tasks in a zero-shot manner. This means CLIP can generalize to classify images it has never seen before based solely on textual descriptions, without the need for further training specific to those tasks.

    Model Details

    • Model Release Date: January 2021
    • Architecture: The model uses a Vision Transformer (ViT-B/32) as the image encoder and a masked self-attention Transformer as the text encoder. These components are trained to maximize the similarity between corresponding image and text pairs.
    • Implementation Variants: Originally, CLIP was released in two variants — one using a ResNet image encoder and another using a Vision Transformer. The version provided here employs the Vision Transformer architecture.

    Usage Instructions

    To use the CLIP model in your Kaggle notebooks, follow these simple steps:

    1. Add the Model as a Kaggle Dataset: Ensure that the dataset containing the CLIP model files is attached to your Kaggle notebook. This dataset includes the necessary model and processor files.

    2. Initialize the Model and Processor: You can load the model and processor directly from the path where the dataset files are stored using the following code snippet:

      # Import CLIP model from transformers
      from transformers import CLIPModel, CLIPProcessor
      
      # Set the path to the model files
      model_path = '/kaggle/input/openaiclip-vit-base-patch32'
      
      # Load the CLIP model
      clip_model = CLIPModel.from_pretrained(model_path)
      
      # Load the CLIP processor
      clip_processor = CLIPProcessor.from_pretrained(model_path)
      

    Documents

  9. HelloDataset

    • kaggle.com
    zip
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    marketneutral (2020). HelloDataset [Dataset]. https://www.kaggle.com/marketneutral/hellodataset
    Explore at:
    zip(207 bytes)Available download formats
    Dataset updated
    Dec 17, 2020
    Authors
    marketneutral
    Description

    This is an example of how to keep a .py file and import it into a notebook. It is useful for code competitions where you want to integrate utility code without duplication across notebooks.

  10. mmcvfull172kaggle

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bui Thanh Xuan (2025). mmcvfull172kaggle [Dataset]. https://www.kaggle.com/datasets/buithanhxuan/mmcvfull172kaggle/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bui Thanh Xuan
    Description

    # install mmcv-full

    !pip install -U openmim

    !mim install mmcv-full

    Tạo wheel files và lưu vào Kaggle Dataset

    # Tạo thư mục để lưu wheel files

    !mkdir -p wheelhouse

    # Tạo wheel files từ các gói đã cài đặt

    !pip wheel openmim mmcv-full -w wheelhouse/ --no-deps

    # Nén các wheel files

    !tar -czvf wheels.tar.gz wheelhouse/

    # Tải tệp nén xuống (trong notebook Kaggle)

    from IPython.display import FileLink

    FileLink("wheels.tar.gz")

    # tạo Kaggle Dataset bằng tệp wheels.tar.gz và trong notebook mới, bạn có thể:

    # Tải dataset về notebook

    !mkdir -p wheelhouse

    !tar -xzvf ../input/your-dataset-name/wheels.tar.gz

    # Cài đặt từ các wheel đã tạo

    !pip install wheelhouse/*.whl

  11. flash-attention-1.0.9-built-for-kaggle-notebooks

    • kaggle.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alekh (2024). flash-attention-1.0.9-built-for-kaggle-notebooks [Dataset]. https://www.kaggle.com/aleksanderhan/flash-attension-1-0-9-built-for-kaggle-notebooks
    Explore at:
    zip(69715816 bytes)Available download formats
    Dataset updated
    May 13, 2024
    Authors
    alekh
    Description
    from flash_attn.flash_attn_interface import flash_attn_unpadded_func as flash_attention_func
    
    def patch_model_with_flash_attn(model):
      # Navigating through the 'model' attribute and then accessing 'layers'
      for layer in model.model.layers:
        # Assuming 'self_attn' is the correct component to modify
        layer.self_attn.attention_module = flash_attention_func
        
    patch_model_with_flash_attn(model)
    
  12. colabcode

    • kaggle.com
    zip
    Updated Sep 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathurin Aché (2020). colabcode [Dataset]. https://www.kaggle.com/mathurinache/colabcode
    Explore at:
    zip(6200 bytes)Available download formats
    Dataset updated
    Sep 10, 2020
    Authors
    Mathurin Aché
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Run code server on Google Colab or Kaggle Notebooks

    Quickstart:

    install colabcode: pip install colabcode import colabcode: from colabcode import ColabCode run: ColabCode(port=10000, password="abhishek") you can also run it with any password or port :) Colab starter notebook: Open In Colab

    ColabCode has the following arguments:

    port: the port you want to run code-server on, default 10000 password: password to protect your code server from being accessed by someone else. Note that there is no password by default! mount_drive: True or False to mount your Google Drive ColabCode comes pre-installed with some VS Code extensions.

    See an example in this video tutorial.: https://www.youtube.com/watch?v=7kTbM3D02jU

  13. Amazon Web Scrapping Dataset

    • kaggle.com
    zip
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Hurairah (2023). Amazon Web Scrapping Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/amazon-web-scrapper-dataset
    Explore at:
    zip(2220 bytes)Available download formats
    Dataset updated
    Jun 17, 2023
    Authors
    Mohammad Hurairah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification

  14. Huggingface Google MobileBERT

    • kaggle.com
    zip
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface Google MobileBERT [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-google-mobilebert
    Explore at:
    zip(875319161 bytes)Available download formats
    Dataset updated
    Jul 26, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the mobilebert hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    ​
    MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/"
    ​
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)
    

    Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  15. Unsloth for offline

    • kaggle.com
    zip
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zie Chan (2025). Unsloth for offline [Dataset]. https://www.kaggle.com/datasets/ziechan/unsloth-for-offline/code
    Explore at:
    zip(5156686198 bytes)Available download formats
    Dataset updated
    Jul 18, 2025
    Authors
    Zie Chan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Unsloth Usage

    If you want to import Unsloth while turning off the internet: !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline torch torchvision torchaudio !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline xformers !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline unsloth !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline bitsandbytes Then you can follow the stardard notebook in unsloth document to fine tune your model.

    Pipeline / model splitting loading is also allowed, so if you do not have enough VRAM for 1 GPU to load say Llama 70B, no worries - we will split the model for you on each GPU! To enable this, use the device_map = "balanced" flag: from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/Llama-3.3-70B-Instruct", load_in_4bit = True, device_map = "balanced", )

    Opensloth

    Contributors have also created a repos to enable or improve multi-GPU support with Unsloth. If you want to use opensloth while turning off internet, run the following code step-by-step: ``` import tarfile import os

    Define the source folder and output path

    source_dir = "/kaggle/input/unsloth-for-offline/fire-0.7.0/fire-0.7.0" output_path = "/kaggle/working/fire-0.7.0.tar.gz" # You can change this path

    Create a tar.gz archive

    with tarfile.open(output_path, "w:gz") as tar: tar.add(source_dir, arcname=os.path.basename(source_dir)) print(f"Created: {output_path}") !pip install --no-index --find-links=/kaggle/working/ fire !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline opensloth==0.1.7 ```

  16. News Category Text Classifier

    • kaggle.com
    zip
    Updated Dec 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafsun Ahmad (2023). News Category Text Classifier [Dataset]. https://www.kaggle.com/datasets/rafsunahmad/classify-news-into-category/code
    Explore at:
    zip(24184 bytes)Available download formats
    Dataset updated
    Dec 29, 2023
    Authors
    Rafsun Ahmad
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is new classifier dataset. This dataset has two different columns. First column contain different types of news and second column contain the news category. This is a multi-class text classification problem.

    To Import data in Kaggle notebook use the following code:

    encodings_to_try = ['utf-8', 'Latin-1', 'ISO-8859-1']
    
    for encoding in encodings_to_try:
      try:
        df = pd.read_csv('/kaggle/input/classify-news-into-category/News Categoires.csv', encoding=encoding)
        print("File read successfully with encoding:", encoding)
        print(df.head())
        break
      except UnicodeDecodeError:
        pass
    
    df.head()
    
  17. Huggingface RoBERTa

    • kaggle.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
    Explore at:
    zip(34531447596 bytes)Available download formats
    Dataset updated
    Aug 4, 2023
    Authors
    Darius Singh
    Description

    This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

    By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

    For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

    Usage

    To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

    from transformers import AutoTokenizer, AutoModelForPreTraining
    ​
    MODEL_DIR = "/kaggle/input/huggingface-roberta/"
    ​
    tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base")
    model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")
    

    Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.

  18. vllm-0.9.2-offline-installer

    • kaggle.com
    zip
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadegh Akbari (2025). vllm-0.9.2-offline-installer [Dataset]. https://www.kaggle.com/datasets/sadeghakbari/vllm-0-9-2-offline-installer/data
    Explore at:
    zip(7873048987 bytes)Available download formats
    Dataset updated
    Jul 16, 2025
    Authors
    Sadegh Akbari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context & Motivation

    This dataset provides a comprehensive, self-contained offline installer for the vllm library, a high-throughput engine for LLM inference. It is specifically designed to solve the common "no internet access" problem in Kaggle competitions like the ARC Prize, where packages must be installed from local files. Using this dataset eliminates pip install failures and ensures a consistent, reproducible environment for your submission notebook.

    Content The dataset contains a single directory, vllm_wheels, which includes the Python wheel file for vllm==0.9.2 and all of its required dependencies. These files were downloaded and packaged in a standard Kaggle environment to ensure maximum compatibility with the competition's execution environment (Python 3.10, CUDA 12.x).

    Usage To use this dataset in your Kaggle notebook (with internet turned OFF):

    1. Add this dataset as an input source to your notebook.
    2. Place the following code in the first cell of your notebook to install vllm before any other code runs:
    import os
    
    # --- vLLM Offline Installation ---
    # Path to the directory containing the wheel files
    WHEELS_PATH = "/kaggle/input/vllm-0-9-2-offline-installer/vllm_wheels"
    
    print("Starting offline installation of vLLM...")
    !pip install --no-index --find-links={WHEELS_PATH} vllm
    
    print("Installation complete.")
    
    # Verify the installation
    import vllm
    print(f"vLLM version {vllm._version_} successfully installed.")
    
  19. TabPFN

    • kaggle.com
    zip
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Inzhirov (2023). TabPFN [Dataset]. https://www.kaggle.com/datasets/neutrino404/tabpfn
    Explore at:
    zip(95945799 bytes)Available download formats
    Dataset updated
    Jun 14, 2023
    Authors
    Mark Inzhirov
    Description

    Use this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:

    1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **

    2.**Next add the following code to a code block in your notebook ** python !pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn !mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff !cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/ 3.** Import** :
    from tabpfn import TabPFNClassifier

    4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:** python classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64) classifier.fit(X_train, Y_train) y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)

    If you want to use TabPFN with GPU use the following code when you make the model: classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)

    You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter

    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

  20. Kaggle: Forum Discussions

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
    Explore at:
    zip(542099 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Nicolás Ariel González Muñoz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

    Summary

    Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

    This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

    Extraction Technique

    As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

    The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

    Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

    If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

    Structure

    This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

    The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

    By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

    Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls
Organization logo

Kaggle's Most Used Packages & Method Calls

Kaggle-Meta Enriched With Imports & Method Calls

Explore at:
zip(2405388375 bytes)Available download formats
Dataset updated
Jun 13, 2025
Authors
TheItCrow
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

Most Imported R PackagesMost Imported Python Packages
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">


We perform this extraction using the following three regex patterns:

PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')

This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

Search
Clear search
Close search
Google apps
Main menu