58 datasets found

Kaggle's Most Used Packages & Method Calls

kaggle.com

zip

Updated Jun 13, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls

Explore at:

zip(2405388375 bytes)Available download formats

Dataset updated

Jun 13, 2025

Authors

TheItCrow

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Enriching the Meta-Kaggle dataset using the Meta Kaggle Code to extract all Imports (for both R and Python) and Method Calls (only Python) as lists, which are then added to the KernelVersions.csv file as the columns Imports and MethodCalls.

Most Imported R Packages	Most Imported Python Packages
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">	https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">

We perform this extraction using the following three regex patterns:

PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')

This dataset was created on 06-06-2025. Since the computation required for this process is very resource-intensive and cannot be run on a Kaggle kernel, it is not scheduled. A notebook demonstrating how to create this dataset and what insights it provides can be found here.

pyspark-package
kaggle.com
zip
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iulian Cozma (2024). pyspark-package [Dataset]. https://www.kaggle.com/datasets/icozma/pyspark-package
Explore at:
zip(1586185224 bytes)Available download formats
Dataset updated
Sep 26, 2024
Authors
Iulian Cozma
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
For installing pyspark when running notebook without Internet:

(1) Attach the pyspark-package dataset to your notebook.
(2) Install pyspark with the following code:

import shutil src_path = r"/kaggle/input/pyspark-package/pyspark-latest.tar.gz.mp4" dst_path = r"/kaggle/working/pyspark-latest.tar.gz" shutil.copy(src_path, dst_path) !pip install /kaggle/working/pyspark-latest.tar.gz

or for specific version, check if that version is available in dataset, then you can use i.e. for 3.5.0:

import shutil src_path = r"/kaggle/input/pyspark-package/pyspark-3.5.0.tar.gz.mp4" dst_path = r"/kaggle/working/pyspark-3.5.0.tar.gz" shutil.copy(src_path, dst_path) !pip install /kaggle/working/pyspark-3.5.0.tar.gz

(3) Then you can use: python import pyspark
R
Farm Harmful Animals Dataset
universe.roboflow.com
kaggle.com
zip
Updated Nov 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SeniorProject (2024). Farm Harmful Animals Dataset [Dataset]. https://universe.roboflow.com/seniorproject-nz8ra/farm-harmful-animals-dataset/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Nov 8, 2024
Dataset authored and provided by
SeniorProject
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Wild Boars Bounding Boxes
Description
https://www.kaggle.com/datasets/muzammilaliveltech/farm-harmful-animals-dataset

this dataset is not mine, it was uploaded to Kaggle by MUZAMMIL ALI VELTECH under CC0: Public Domain. This Roboflow project was made as an attempt to use the dataset after having issue trying to import in Jupyter Notebook from Kaggle
Pytorch Models
kaggle.com
zip
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufian Othman (2025). Pytorch Models [Dataset]. https://www.kaggle.com/datasets/mohdsufianbinothman/pytorch-models/data
Explore at:
zip(21493 bytes)Available download formats
Dataset updated
May 10, 2025
Authors
Sufian Othman
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
✅ Step 1: Mount to Dataset

Search for my dataset pytorch-models and add it — this will mount it at:

/kaggle/input/pytorch-models/

✅ Step 2: Check file paths Once mounted, the four files will be available at:

/kaggle/input/pytorch-models/base_models.py /kaggle/input/pytorch-models/ext_base_models.py /kaggle/input/pytorch-models/ext_hybrid_models.py /kaggle/input/pytorch-models/hybrid_models.py

✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):

import shutil shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')

✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:

import base_models import ext_base_models import ext_hybrid_models import hybrid_models

Or, if you only want to import specific classes or functions:

from base_models import YourModelClass from ext_base_models import AnotherModelClass

✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:

model = base_models.YourModelClass() output = model(input_data)
google/flan-t5-large
kaggle.com
zip
Updated Jul 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
d0rj_ (2023). google/flan-t5-large [Dataset]. https://www.kaggle.com/datasets/d0rj3228/googleflan-t5-large
Explore at:
zip(23751646406 bytes)Available download formats
Dataset updated
Jul 14, 2023
Authors
d0rj_
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Info

Source repo is google/flan-t5-large.

Usage

Add dataset to Kaggle notebook;

Import pretrained from folder;

from transformers import AutoTokenizer, AutoModel model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large') tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
Pretrained Google's Deplot
kaggle.com
zip
Updated May 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henry Hoang (2023). Pretrained Google's Deplot [Dataset]. https://www.kaggle.com/datasets/nhatminhhoang/google-deplot-model
Explore at:
zip(1049307570 bytes)Available download formats
Dataset updated
May 17, 2023
Authors
Henry Hoang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Pre-trained weights of Deplot (source: https://huggingface.co/google/deplot) to import directly to the notebook when the internet is off.

First, run this to install the latest version of the library transformers (reason) !pip install git+https://github.com/huggingface/transformers

Usage: ``` from transformers import Pix2StructForConditionalGeneration, AutoProcessor

model = Pix2StructForConditionalGeneration.from_pretrained('/kaggle/input/google-deplot-model') processor = AutoProcessor.from_pretrained('/kaggle/input/google-deplot-model') ```
Book Metadata from BooksToScrape
kaggle.com
zip
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Book Metadata from BooksToScrape [Dataset]. https://www.kaggle.com/datasets/thedevastator/book-metadata-from-bookstoscrape/discussion?sort=undefined
Explore at:
zip(557690 bytes)Available download formats
Dataset updated
Feb 11, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Book Metadata from BooksToScrape

Unlocking Hidden Insights in Reading Material

By [source]

About this dataset

This dataset allows readers to unlock hidden insights into contemporary literature and the books that people are choosing to purchase. It provides comprehensive and powerful data related to a web books retailer, books.toscrape.com, featuring 12 columns of crucial book metadata gathered through web scraping methods in November 2020. Researching publications through this information provides a great sense of insight and understanding into the current reading climate: uncovering emerging trends in what people are buying, reading, rating, and loving worldwide. With this dataset at your disposal you can explore book popularity from a commercial standpoint as well as a creative one; examining publishing preferences from authors' points of view across reviews and genres alike. Dive into discovering the secrets behind book selection habits by delving into topics ranging from rating systems for certain works to pricing analysis for publishers- all fuelled by this carefully organised streamline of data at play here today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

To get started analyzing this dataset with Kaggle notebooks or other tools: - Open up your tool (Kaggle notebook or another tool) that supports reading CSV files
- Import the dataset.csv file into your chosen program - Explore each column individually to better understand what type of book metadata exists within each category – descriptors such as title, image URLs/links, ratings/number of reviews, description and more can be found here; 5. Once familiarized with each type for metadata for each column provided by this dataset – begin exploring any correlations between them to deepen understanding about trends among different types for books over time – broken down by category; 6 Lastly – use all available resources through 3rd-party packages within your chosen programming language to continue exploring deeper analysis possibilities (e.g., Pandas).

By following these steps - you are now ready to start exploring powerful literature insights into contemporary reading material standards! Enjoy discovering hidden insights within this book metadata - that may have otherwise gone undiscovered!

Research Ideas

Generating recommendations of books based on popularity, price point and/of rating.

Tracking the success of certain authors/publishers in the long term and understanding their audience preferences.

Analysing which types of books consumers prefer (genre, age group targeting) over time to provide useful data to new authors to increase their chances of success

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: dataset.csv | Column name | Description | |:----------------------------------------|:---------------------------------------------------| | Logan Kade (Fallen Crest High #5.5) | Title of the book. (String) | | https | Image URL of the book. (String) | | Two | Rating of the book. (Integer) | | Academic | Description Category of the book. (String) | | 7093cf549cd2e7de | Universal Product Code (UPC) of the book. (String) | | Books | Product Type of the book. (String) | | £13.12 | Price Excluding Tax of the book. (Float) | | £13.12.1 | Price Including Tax of the book. (Float) | | £0.00 | Tax Amount of the book. (Float) | | In stock (5 available) | Availability of the book. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
openai/clip-vit-base-patch32
kaggle.com
zip
Updated Jun 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarun Singhal (2024). openai/clip-vit-base-patch32 [Dataset]. https://www.kaggle.com/datasets/singhaltarun/openaiclip-vit-base-patch32/discussion
Explore at:
zip(400492841 bytes)Available download formats
Dataset updated
Jun 29, 2024
Authors
Tarun Singhal
Description
Using the OpenAI CLIP Model in Kaggle

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11228484%2F27123889b0ce5c7a326965dfc9c29f00%2Foverview-a.svg?generation=1719898030106027&alt=media" alt="Overview A">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11228484%2Faa6a6110c2a9a63690339c6a6bfe7ab6%2Foverview-b.svg?generation=1719898209012848&alt=media" alt="Overview B">

Introduction

The CLIP (Contrastive Language–Image Pre-training) model is an innovative approach developed by OpenAI, designed to enhance the robustness of computer vision tasks. It leverages a unique training regimen that aligns images with textual descriptions using a contrastive loss, enabling it to perform image classification tasks in a zero-shot manner. This means CLIP can generalize to classify images it has never seen before based solely on textual descriptions, without the need for further training specific to those tasks.

Model Details

Model Release Date: January 2021

Architecture: The model uses a Vision Transformer (ViT-B/32) as the image encoder and a masked self-attention Transformer as the text encoder. These components are trained to maximize the similarity between corresponding image and text pairs.

Implementation Variants: Originally, CLIP was released in two variants — one using a ResNet image encoder and another using a Vision Transformer. The version provided here employs the Vision Transformer architecture.

Usage Instructions

To use the CLIP model in your Kaggle notebooks, follow these simple steps:

Add the Model as a Kaggle Dataset: Ensure that the dataset containing the CLIP model files is attached to your Kaggle notebook. This dataset includes the necessary model and processor files.

Initialize the Model and Processor: You can load the model and processor directly from the path where the dataset files are stored using the following code snippet:

# Import CLIP model from transformers from transformers import CLIPModel, CLIPProcessor # Set the path to the model files model_path = '/kaggle/input/openaiclip-vit-base-patch32' # Load the CLIP model clip_model = CLIPModel.from_pretrained(model_path) # Load the CLIP processor clip_processor = CLIPProcessor.from_pretrained(model_path)

Documents

Blog

CLIP Paper
HelloDataset
kaggle.com
zip
Updated Dec 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
marketneutral (2020). HelloDataset [Dataset]. https://www.kaggle.com/marketneutral/hellodataset
Explore at:
zip(207 bytes)Available download formats
Dataset updated
Dec 17, 2020
Authors
marketneutral
Description
This is an example of how to keep a .py file and import it into a notebook. It is useful for code competitions where you want to integrate utility code without duplication across notebooks.
mmcvfull172kaggle
kaggle.com
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bui Thanh Xuan (2025). mmcvfull172kaggle [Dataset]. https://www.kaggle.com/datasets/buithanhxuan/mmcvfull172kaggle/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bui Thanh Xuan
Description
# install mmcv-full

!pip install -U openmim

!mim install mmcv-full

Tạo wheel files và lưu vào Kaggle Dataset

# Tạo thư mục để lưu wheel files

!mkdir -p wheelhouse

# Tạo wheel files từ các gói đã cài đặt

!pip wheel openmim mmcv-full -w wheelhouse/ --no-deps

# Nén các wheel files

!tar -czvf wheels.tar.gz wheelhouse/

# Tải tệp nén xuống (trong notebook Kaggle)

from IPython.display import FileLink

FileLink("wheels.tar.gz")

# tạo Kaggle Dataset bằng tệp wheels.tar.gz và trong notebook mới, bạn có thể:

# Tải dataset về notebook

!mkdir -p wheelhouse

!tar -xzvf ../input/your-dataset-name/wheels.tar.gz

# Cài đặt từ các wheel đã tạo

!pip install wheelhouse/*.whl
flash-attention-1.0.9-built-for-kaggle-notebooks
kaggle.com
zip
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alekh (2024). flash-attention-1.0.9-built-for-kaggle-notebooks [Dataset]. https://www.kaggle.com/aleksanderhan/flash-attension-1-0-9-built-for-kaggle-notebooks
Explore at:
zip(69715816 bytes)Available download formats
Dataset updated
May 13, 2024
Authors
alekh
Description
from flash_attn.flash_attn_interface import flash_attn_unpadded_func as flash_attention_func def patch_model_with_flash_attn(model): # Navigating through the 'model' attribute and then accessing 'layers' for layer in model.model.layers: # Assuming 'self_attn' is the correct component to modify layer.self_attn.attention_module = flash_attention_func patch_model_with_flash_attn(model)
colabcode
kaggle.com
zip
Updated Sep 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2020). colabcode [Dataset]. https://www.kaggle.com/mathurinache/colabcode
Explore at:
zip(6200 bytes)Available download formats
Dataset updated
Sep 10, 2020
Authors
Mathurin Aché
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Run code server on Google Colab or Kaggle Notebooks

Quickstart:

install colabcode: pip install colabcode import colabcode: from colabcode import ColabCode run: ColabCode(port=10000, password="abhishek") you can also run it with any password or port :) Colab starter notebook: Open In Colab

ColabCode has the following arguments:

port: the port you want to run code-server on, default 10000 password: password to protect your code server from being accessed by someone else. Note that there is no password by default! mount_drive: True or False to mount your Google Drive ColabCode comes pre-installed with some VS Code extensions.

See an example in this video tutorial.: https://www.youtube.com/watch?v=7kTbM3D02jU
Amazon Web Scrapping Dataset
kaggle.com
zip
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Hurairah (2023). Amazon Web Scrapping Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/amazon-web-scrapper-dataset
Explore at:
zip(2220 bytes)Available download formats
Dataset updated
Jun 17, 2023
Authors
Mohammad Hurairah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification
Huggingface Google MobileBERT
kaggle.com
zip
Updated Jul 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Singh (2023). Huggingface Google MobileBERT [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-google-mobilebert
Explore at:
zip(875319161 bytes)Available download formats
Dataset updated
Jul 26, 2023
Authors
Darius Singh
Description
This dataset contains different variants of the MobileBERT model by Google available on Hugging Face's model repository.

By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

For more information on usage visit the mobilebert hugging face docs.

Usage

To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForPreTraining MODEL_DIR = "/kaggle/input/huggingface-google-mobilebert/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR) model = AutoModelForPreTraining.from_pretrained(MODEL_DIR)

Acknowledgements All the copyrights and IP relating to MobileBERT belong to the original authors (Sun et al.) and Google. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
Unsloth for offline
kaggle.com
zip
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zie Chan (2025). Unsloth for offline [Dataset]. https://www.kaggle.com/datasets/ziechan/unsloth-for-offline/code
Explore at:
zip(5156686198 bytes)Available download formats
Dataset updated
Jul 18, 2025
Authors
Zie Chan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Unsloth Usage

If you want to import Unsloth while turning off the internet: !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline torch torchvision torchaudio !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline xformers !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline unsloth !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline bitsandbytes Then you can follow the stardard notebook in unsloth document to fine tune your model.

Pipeline / model splitting loading is also allowed, so if you do not have enough VRAM for 1 GPU to load say Llama 70B, no worries - we will split the model for you on each GPU! To enable this, use the device_map = "balanced" flag: from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "unsloth/Llama-3.3-70B-Instruct", load_in_4bit = True, device_map = "balanced", )

Opensloth

Contributors have also created a repos to enable or improve multi-GPU support with Unsloth. If you want to use opensloth while turning off internet, run the following code step-by-step: ``` import tarfile import os

Define the source folder and output path

source_dir = "/kaggle/input/unsloth-for-offline/fire-0.7.0/fire-0.7.0" output_path = "/kaggle/working/fire-0.7.0.tar.gz" # You can change this path

Create a tar.gz archive

with tarfile.open(output_path, "w:gz") as tar: tar.add(source_dir, arcname=os.path.basename(source_dir)) print(f"Created: {output_path}") !pip install --no-index --find-links=/kaggle/working/ fire !pip install --no-index --find-links=/kaggle/input/unsloth-for-offline opensloth==0.1.7 ```
News Category Text Classifier
kaggle.com
zip
Updated Dec 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafsun Ahmad (2023). News Category Text Classifier [Dataset]. https://www.kaggle.com/datasets/rafsunahmad/classify-news-into-category/code
Explore at:
zip(24184 bytes)Available download formats
Dataset updated
Dec 29, 2023
Authors
Rafsun Ahmad
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is new classifier dataset. This dataset has two different columns. First column contain different types of news and second column contain the news category. This is a multi-class text classification problem.

To Import data in Kaggle notebook use the following code:

encodings_to_try = ['utf-8', 'Latin-1', 'ISO-8859-1'] for encoding in encodings_to_try: try: df = pd.read_csv('/kaggle/input/classify-news-into-category/News Categoires.csv', encoding=encoding) print("File read successfully with encoding:", encoding) print(df.head()) break except UnicodeDecodeError: pass df.head()
Huggingface RoBERTa
kaggle.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
Explore at:
zip(34531447596 bytes)Available download formats
Dataset updated
Aug 4, 2023
Authors
Darius Singh
Description
This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

Usage

To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForPreTraining MODEL_DIR = "/kaggle/input/huggingface-roberta/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base") model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")

Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
vllm-0.9.2-offline-installer
kaggle.com
zip
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sadegh Akbari (2025). vllm-0.9.2-offline-installer [Dataset]. https://www.kaggle.com/datasets/sadeghakbari/vllm-0-9-2-offline-installer/data
Explore at:
zip(7873048987 bytes)Available download formats
Dataset updated
Jul 16, 2025
Authors
Sadegh Akbari
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Context & Motivation

This dataset provides a comprehensive, self-contained offline installer for the vllm library, a high-throughput engine for LLM inference. It is specifically designed to solve the common "no internet access" problem in Kaggle competitions like the ARC Prize, where packages must be installed from local files. Using this dataset eliminates pip install failures and ensures a consistent, reproducible environment for your submission notebook.

Content The dataset contains a single directory, vllm_wheels, which includes the Python wheel file for vllm==0.9.2 and all of its required dependencies. These files were downloaded and packaged in a standard Kaggle environment to ensure maximum compatibility with the competition's execution environment (Python 3.10, CUDA 12.x).

Usage To use this dataset in your Kaggle notebook (with internet turned OFF):

Add this dataset as an input source to your notebook.

Place the following code in the first cell of your notebook to install vllm before any other code runs:

import os # --- vLLM Offline Installation --- # Path to the directory containing the wheel files WHEELS_PATH = "/kaggle/input/vllm-0-9-2-offline-installer/vllm_wheels" print("Starting offline installation of vLLM...") !pip install --no-index --find-links={WHEELS_PATH} vllm print("Installation complete.") # Verify the installation import vllm print(f"vLLM version {vllm._version_} successfully installed.")
TabPFN
kaggle.com
zip
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Inzhirov (2023). TabPFN [Dataset]. https://www.kaggle.com/datasets/neutrino404/tabpfn
Explore at:
zip(95945799 bytes)Available download formats
Dataset updated
Jun 14, 2023
Authors
Mark Inzhirov
Description
Use this data set when submitting code offline for competitions otherwise just use !pip install tabpfn for online use. Usage for offline code submissions within Kaggle notebooks is as follows:

1**.First add the dataset by selecting "add data" and searching for this dataset and adding it to your input. **

2.**Next add the following code to a code block in your notebook ** python !pip install tabpfn --no-index --find-links=file:///kaggle/input/tabpfn !mkdir -p /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff !cp /kaggle/input/tabpfn/prior_diff_real_checkpoint_n_0_epoch_100.cpkt /opt/conda/lib/python3.10/site-packages/tabpfn/models_diff/ 3.** Import** :
from tabpfn import TabPFNClassifier

4.**Now you are all set you can create a classifier and run it offline for submission in offline kaggle code competitions:** python classifier = TabPFNClassifier(device='cpu',N_ensemble_configurations=64) classifier.fit(X_train, Y_train) y_eval, p_eval = classifier.predict(X_cv, return_winning_probability=True)

If you want to use TabPFN with GPU use the following code when you make the model: classifier = TabPFNClassifier(device='cuda',N_ensemble_configurations=32)

You can find documentation for this package on GitHub: https://github.com/automl/TabPFN.git Original paper on TabPFN can be found at: https://arxiv.org/abs/2207.01848 License Copyright 2022 Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Kaggle: Forum Discussions
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
Explore at:
zip(542099 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
Nicolás Ariel González Muñoz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

Summary

Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

Extraction Technique

As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

Structure

This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.

Facebook

Twitter

Click to copy link

Link copied

Cite

TheItCrow (2025). Kaggle's Most Used Packages & Method Calls [Dataset]. https://www.kaggle.com/datasets/kevinbnisch/kaggles-most-used-packages-and-method-calls

Kaggle's Most Used Packages & Method Calls

Kaggle-Meta Enriched With Imports & Method Calls

Explore at:

zip(2405388375 bytes)Available download formats

Dataset updated

Jun 13, 2025

Authors

TheItCrow

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Most Imported R Packages	Most Imported Python Packages
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2F5bb95536aa5d8092d56f526aa04c8cd1%2Foutput.png?generation=1749374431744993&alt=media" alt="">	https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17421843%2Fa3d9a02ae0b314bfa6b3eb411c405ec0%2Foutput1.png?generation=1749374439690291&alt=media" alt="">

We perform this extraction using the following three regex patterns:

PYTHON_IMPORT_REGEX = re.compile(r'(?:from\s+([a-zA-Z0-9_\.]+)\s+import|import\s+([a-zA-Z0-9_\.]+))')
PYTHON_METHOD_REGEX = *I wish I could add the regex here but kaggle kinda breaks if I do lol*
R_IMPORT_REGEX = re.compile(r'(?:library|require)\((?:[\'"]?)([a-zA-Z0-9_.]+)(?:[\'"]?)\)')

Clear search

Close search

Google apps

Main menu

Kaggle's Most Used Packages & Method Calls

pyspark-package

Farm Harmful Animals Dataset

Pytorch Models

google/flan-t5-large

Info

Usage

Pretrained Google's Deplot

Book Metadata from BooksToScrape

Book Metadata from BooksToScrape

Unlocking Hidden Insights in Reading Material

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

openai/clip-vit-base-patch32

Using the OpenAI CLIP Model in Kaggle

Introduction

Model Details

Usage Instructions

Documents

HelloDataset

mmcvfull172kaggle

# install mmcv-full

!pip install -U openmim

!mim install mmcv-full

Tạo wheel files và lưu vào Kaggle Dataset

# Tạo thư mục để lưu wheel files

!mkdir -p wheelhouse

# Tạo wheel files từ các gói đã cài đặt

!pip wheel openmim mmcv-full -w wheelhouse/ --no-deps

# Nén các wheel files

!tar -czvf wheels.tar.gz wheelhouse/

# Tải tệp nén xuống (trong notebook Kaggle)

from IPython.display import FileLink

FileLink("wheels.tar.gz")

# tạo Kaggle Dataset bằng tệp wheels.tar.gz và trong notebook mới, bạn có thể:

# Tải dataset về notebook

!mkdir -p wheelhouse

!tar -xzvf ../input/your-dataset-name/wheels.tar.gz

# Cài đặt từ các wheel đã tạo

!pip install wheelhouse/*.whl

flash-attention-1.0.9-built-for-kaggle-notebooks

colabcode

Amazon Web Scrapping Dataset

Huggingface Google MobileBERT

Unsloth for offline

Unsloth Usage

Opensloth

Define the source folder and output path

Create a tar.gz archive

News Category Text Classifier

To Import data in Kaggle notebook use the following code:

Huggingface RoBERTa

vllm-0.9.2-offline-installer

TabPFN

Kaggle: Forum Discussions

Summary

Extraction Technique

Structure

Kaggle's Most Used Packages & Method Calls

Kaggle-Meta Enriched With Imports & Method Calls