100+ datasets found

test-notebook-upload
kaggle.com
zip
Updated May 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexis Cook (2020). test-notebook-upload [Dataset]. https://www.kaggle.com/alexisbcook/testnotebookupload
Explore at:
zip(4320 bytes)Available download formats
Dataset updated
May 29, 2020
Authors
Alexis Cook
Description
Dataset

This dataset was created by Alexis Cook

Contents
R
Farm Harmful Animals 2 Dataset
universe.roboflow.com
zip
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SeniorProject (2024). Farm Harmful Animals 2 Dataset [Dataset]. https://universe.roboflow.com/seniorproject-nz8ra/farm-harmful-animals-dataset-2
Explore at:
zipAvailable download formats
Dataset updated
Nov 11, 2024
Dataset authored and provided by
SeniorProject
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Wild Boars WeRt Bounding Boxes
Description
https://www.kaggle.com/datasets/muzammilaliveltech/farm-harmful-animals-dataset

this dataset is not mine, it was uploaded to Kaggle by MUZAMMIL ALI VELTECH under CC0: Public Domain. This Roboflow project was made as an attempt to use the dataset after having issue trying to import in Jupyter Notebook from Kaggle
(🌅 Sunset) Kaggle Users' Country + Regions Info
kaggle.com
zip
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). (🌅 Sunset) Kaggle Users' Country + Regions Info [Dataset]. https://www.kaggle.com/datasets/bwandowando/kaggle-user-country-regions
Explore at:
zip(2376511 bytes)Available download formats
Dataset updated
Feb 14, 2024
Authors
BwandoWando
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
[Context]

The official Meta-Kaggle dataset contains the Users.csv file which contains Username, DisplayName, RegisterDate, and PerformanceTier fields but doesn't contain location data of the Kaggle Users. This dataset augments that data with additional country and region information.

[Note]

I haven't included the username and displayname values on purpose, just the userid to be joined back to the Meta-Kaggle official Users.csv file.

[Limitations]

It is possible that some users haven't inputted their details when the scraper went through their accounts and thus have missing data. Another possibility is that users may have updated their info after the scraper went through their accounts, thus resulting in inconsistencies.

[How I defined active in this dataset]

Users that have received an upvote in the forums, datasets, or notebooks

Users that have given an upvote in the forums, datasets, or notebooks

Users that have created a thread, a forum post, a notebook, or a dataset

Users that made a competition submission

Users that exist in the Meta-Kaggle Users dataset

Date cut-off of Jan 01, 2019

[Update]

15-Feb-2024- Since the Kaggle member's profile page update, the scrapers arent working anymore as the UI layout has changed. Will fix this when we get the time.
SAMSUNG FIRMWARE
kaggle.com
zip
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashmit Cajla (2023). SAMSUNG FIRMWARE [Dataset]. https://www.kaggle.com/datasets/ashmitcajla/samsung-firmware
Explore at:
zip(8523309 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
Ashmit Cajla
Description
Clean the data, Preprocess it and then make predictions. Try and Learn.
videos_L22_a
kaggle.com
zip
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Huu Phuoc (2025). videos_L22_a [Dataset]. https://www.kaggle.com/datasets/akiyanguyen/aic25-zip-videos-l22-a
Explore at:
zip(4134175764 bytes)Available download formats
Dataset updated
Aug 27, 2025
Authors
Nguyen Huu Phuoc
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mirror of external ZIP: https://aic-data.ledo.io.vn/Videos_L22_a.zip

Uploaded via Kaggle Notebook (one-file-per-dataset).
R
Custom Yolov7 On Kaggle On Custom Dataset
universe.roboflow.com
zip
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jan 29, 2023
Dataset authored and provided by
Owais Ahmad
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Person Car Bounding Boxes
Description
Custom Training with YOLOv7 🔥

Some Important links

Model Inference🤖

🚀Training Yolov7 on Kaggle

Weight and Biases 🐝

HuggingFace 🤗 Model Repo

Contact Information

Name - Owais Ahmad

Phone - +91-9515884381

Email - owaiskhan9654@gmail.com

Portfolio - https://owaiskhan9654.github.io/

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

Link to the Downloadable Dataset

from IPython.display import Markdown, display display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))

Custom Training with YOLOv7 🔥

In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

Export the dataset to YOLOv7

Train YOLOv7 to recognize the objects in our dataset

Evaluate our YOLOv7 model's performance

Run test inference to view performance of YOLOv7 model at work

📦 YOLOv7

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

Image Credit - jinfagang

Step 1: Install Requirements

!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements %cd yolov7 !pip install -qr requirements.txt !pip install -q roboflow

Downloading YOLOV7 starting checkpoint

!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"

import os import glob import wandb import torch from roboflow import Roboflow from kaggle_secrets import UserSecretsClient from IPython.display import Image, clear_output, display # to display images print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

YOLOv7-Car-Person-Custom

try: user_secrets = UserSecretsClient() wandb_api_key = user_secrets.get_secret("wandb_api") wandb.login(key=wandb_api_key) anonymous = None except: wandb.login(anonymous='must') print('To use your W&B account, Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. Get your W&B access token from here: https://wandb.ai/authorize') wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")

Step 2: Assemble Our Dataset

https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

In Roboflow, We can choose between two paths:

Convert an existing Coco dataset to YOLOv7 format. In Roboflow it supports over 30 formats object detection formats for conversion.

Uploading only these raw images and annotate them in Roboflow with Roboflow Annotate.

Version v2 Aug 12, 2022 Looks like this.

https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

user_secrets = UserSecretsClient() roboflow_api_key = user_secrets.get_secret("roboflow_api")

rf = Roboflow(api_key=roboflow_api_key) project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq") dataset = project.version(2).download("yolov7")

Step 3: Training Custom pretrained YOLOv7 model

Here, I am able to pass a number of arguments: - img: define input image size - batch: determine
h
Eira1-A2I-Datset
huggingface.co
Updated Oct 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bock Health (2025). Eira1-A2I-Datset [Dataset]. https://huggingface.co/datasets/bockhealthbharath/Eira1-A2I-Datset
Explore at:
Dataset updated
Oct 19, 2025
Dataset authored and provided by
Bock Health
Description
Dataset Creation

The notebook uploaded here is used to convert the audio and image into tensor files which will be used by the training script to train the Image Decoder Module. The Dataset used was "https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3". This dataset must be imported into the environment (Eg. Kaggle) and the path BASE_DATA_PATH must be updated in the notebook
Reddit: /r/pokemon
kaggle.com
zip
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit: /r/pokemon [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-popular-pokemon-topics-and-user-inter
Explore at:
zip(434545 bytes)Available download formats
Dataset updated
Dec 19, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Reddit: /r/pokemon

Exploring Post Popularity, User Engagement and Topic Disscussion

By Reddit [source]

About this dataset

This Kaggle dataset provides a unique opportunity to explore the ongoing conversations and discussions of the popular Pokémon franchise across Reddit communities. It contains over a thousand entries compiled from posts and comments made by avid Pokémon fans, providing valuable insights into post popularity, user engagement, and topic discussion. With these comprehensive data points including post title, score, post ID link URL, number of comments and date & time created along with body text and timestamp – powerful analysis can be conducted to assess how trends in Pokémon-related activities are evolving over time. So why not dive deep into this fascinating world of Poké-interactions? Follow us as we navigate through the wide range of interesting topics being discussed on Reddit about this legendary franchise!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains over a thousand entries of user conversations related to the Pokémon community posted and commented on Reddit. By using this dataset, you can explore the popularity of Pokémon related topics, the level of user engagement, and how user interactions shape the discussion around each topic. To do so, you’ll want to focus on columns such as title, score, url, comms_num (number of comments on a post), created (date and time when post was created) and timestamp.
For starters you can start by looking at how many posts have been made about certain topics by using “title” column as a keyword search bar – e.g., ‘Magikarp’ or ‘Team Rocket’ – to see just how many posts have been about them in total. With this data in mind, you could consider what makes popular posts become popular and look at the number of upvotes from users (stored in “score”)– i.e., what posts caught people's attention? Beyond upvotes however is downvotes - can these be taken into account when it comes to gauging popularity? One could also take into consideration user engagement by looking at comms_num as it contains information regarding number of comments left for each post - does an increase in comments lead to an increase in upvotes?
Additionally one could examine how posts were communicated with users by reading into body texts stored under 'body'. Through this information users can create insights into overall discussion per topic: are they conversational or argumentative? Are there underlying regional trends taking place among commenters who place emphasis on different elements regarding their pokemon-related discussions?
This opens up possibilities for further investigations into understanding pokemon-related phenomena through Reddit discussion; finding out what makes certain topics prevalent while others stay obscure; seeing where our World Regions lay within certain conversations; all while understanding specific nuances within conversation trees between commenters!

Research Ideas

Analyzing the influence of post upvotes in user engagement and conversation outcomes

Investigating the frequency of topics discussed in Pokémon related conversations

Examining the correlation between post score and number of comments on each post

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: pokemon.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments the post has received. (Integer) | | created | The date and time the post was created. (DateTime) | | body | The body text of the post. (String) | | timestamp | The timestamp of the post. (Integer) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
Huggingface RoBERTa
kaggle.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Singh (2023). Huggingface RoBERTa [Dataset]. https://www.kaggle.com/datasets/dariussingh/huggingface-roberta
Explore at:
zip(34531447596 bytes)Available download formats
Dataset updated
Aug 4, 2023
Authors
Darius Singh
Description
This dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.

By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".

For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.

Usage

To use this dataset, attach it to your notebook and specify the path to the dataset. For example:

from transformers import AutoTokenizer, AutoModelForPreTraining MODEL_DIR = "/kaggle/input/huggingface-roberta/" tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base") model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")

Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
Processed dataset of smart meters in London
kaggle.com
zip
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chun Fu (2023). Processed dataset of smart meters in London [Dataset]. https://www.kaggle.com/datasets/patrick0302/10qs-load-forecasting-london-dataset
Explore at:
zip(328789016 bytes)Available download formats
Dataset updated
Apr 17, 2023
Authors
Chun Fu
Area covered
London
Description
This dataset has been processed from the Smart meters in London available on Kaggle. The original dataset contains information on households, and the meter readings can be aggregated at various levels - household, blocks, and the entire city. However, due to long processing times and high memory usage during aggregation, this dataset provides processed results that can be directly used for analysis or modeling.

To explore and model the dataset, an example notebook is available at the Github link.
hm-pre-recommendation
kaggle.com
zip
Updated Mar 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyentuananh (2022). hm-pre-recommendation [Dataset]. https://www.kaggle.com/datasets/astrung/hm-pre-recommendation
Explore at:
zip(1153481669 bytes)Available download formats
Dataset updated
Mar 20, 2022
Authors
Nguyentuananh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Giba has introduced hybrid approach which use other notebooks result for better performance: https://www.kaggle.com/titericz/h-m-ensembling-how-to/notebook

Altu has a new improved version with lstm model: https://www.kaggle.com/code/atulverma/h-m-ensembling-with-lstm

But you need to download other notebooks result, then upload it if you want to use within your notebook. So i create this dataset for anyone who want to use directly notebook result without download/upload. Please upvote if it help you

Content

This dataset contain 5 results as input using for a hybrid approach in this notebook: * https://www.kaggle.com/titericz/h-m-ensembling-how-to/notebook. * https://www.kaggle.com/code/atulverma/h-m-ensembling-with-lstm

If you want to use this notebook but can't access to private dataset, please add my dataset to your notebook, than change file path. It has 5 files: * submissio_byfone_chris.csv: Submission result from: https://www.kaggle.com/lichtlab/0-0226-byfone-chris-combination-approach
* submission_exponential_decay.csv: Submission result from: https://www.kaggle.com/tarique7/hnm-exponential-decay-with-alternate-items/notebook * submission_trending.csv: Submission result from: https://www.kaggle.com/lunapandachan/h-m-trending-products-weekly-add-test/notebook * submission_sequential_model.csv: Submission result from: https://www.kaggle.com/code/astrung/sequential-model-fixed-missing-last-item/notebook * submission_sequential_with_item_feature.csv: Submission result from: https://www.kaggle.com/code/astrung/lstm-model-with-item-infor-fix-missing-last-item/notebook
Furniture Sales Data
kaggle.com
Updated Aug 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RAJ AGRAWAL (2024). Furniture Sales Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/9253879
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/9253879
Dataset updated
Aug 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RAJ AGRAWAL
Description
This dataset is generated for the purpose of analyzing furniture sales data using multiple regression techniques. It contains 2,500 rows with 15 columns, including 7 numerical columns and 7 categorical columns, along with a target variable (revenue) which represents the total revenue generated from furniture sales. The dataset captures various aspects of furniture sales, such as pricing, cost, sales volume, discount percentage, inventory levels, delivery time, and different categorical attributes like furniture type, material, color, and store location.

Guys please upload your notebook of this dataset so that others can also learn from your work
AIC25 ZIP – media-info-aic25-b1.zip
kaggle.com
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Huu Phuoc (2025). AIC25 ZIP – media-info-aic25-b1.zip [Dataset]. https://www.kaggle.com/datasets/akiyanguyen/aic25-zip-media-info-aic25-b1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nguyen Huu Phuoc
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mirror of external ZIP: https://aic-data.ledo.io.vn/media-info-aic25-b1.zip

Uploaded via Kaggle Notebook (one-file-per-dataset).
AIC25 ZIP – clip-features-32-aic25-b1.zip
kaggle.com
zip
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Huu Phuoc (2025). AIC25 ZIP – clip-features-32-aic25-b1.zip [Dataset]. https://www.kaggle.com/datasets/akiyanguyen/aic25-zip-clip-features-32-aic25-b1
Explore at:
zip(168478723 bytes)Available download formats
Dataset updated
Aug 27, 2025
Authors
Nguyen Huu Phuoc
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Mirror of external ZIP: https://aic-data.ledo.io.vn/clip-features-32-aic25-b1.zip

Uploaded via Kaggle Notebook (one-file-per-dataset).
Pytorch Models
kaggle.com
zip
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufian Othman (2025). Pytorch Models [Dataset]. https://www.kaggle.com/datasets/mohdsufianbinothman/pytorch-models/data
Explore at:
zip(21493 bytes)Available download formats
Dataset updated
May 10, 2025
Authors
Sufian Othman
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
✅ Step 1: Mount to Dataset

Search for my dataset pytorch-models and add it — this will mount it at:

/kaggle/input/pytorch-models/

✅ Step 2: Check file paths Once mounted, the four files will be available at:

/kaggle/input/pytorch-models/base_models.py /kaggle/input/pytorch-models/ext_base_models.py /kaggle/input/pytorch-models/ext_hybrid_models.py /kaggle/input/pytorch-models/hybrid_models.py

✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):

import shutil shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/') shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')

✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:

import base_models import ext_base_models import ext_hybrid_models import hybrid_models

Or, if you only want to import specific classes or functions:

from base_models import YourModelClass from ext_base_models import AnotherModelClass

✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:

model = base_models.YourModelClass() output = model(input_data)
Amazon Web Scrapping Dataset
kaggle.com
zip
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Hurairah (2023). Amazon Web Scrapping Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadhurairah/amazon-web-scrapper-dataset
Explore at:
zip(2220 bytes)Available download formats
Dataset updated
Jun 17, 2023
Authors
Mohammad Hurairah
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification
MIntRec2.0_RawData
kaggle.com
zip
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shinnew9918 (2024). MIntRec2.0_RawData [Dataset]. https://www.kaggle.com/datasets/shinnew9918/mintrec2-0-rawdata/code
Explore at:
zip(14075980395 bytes)Available download formats
Dataset updated
Jun 16, 2024
Authors
shinnew9918
Description
MIntRec stands for multimodal intent recognition. The benchmark dataset is first introduced at ACM MM 2022: https://dl.acm.org/doi/10.1145/3503161.3547906. The uploaded version is a latest second version which is introduced in ICLR 2024: https://openreview.net/forum?id=nY9nITZQjc.

More details can be found here: https://github.com/thuiar/MIntRec2.0 , and among the 3 versions, I uploaded the feature and raw data(this one's raw data) because they seem to be the best way to practice on the Kaggle Notebook Environment.

The purpose of uploading this dataset is to practice multi-classification tasks using LLMs whether the large models can recognize and classify human intent properly.
google/flan-t5-large
kaggle.com
zip
Updated Jul 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
d0rj_ (2023). google/flan-t5-large [Dataset]. https://www.kaggle.com/datasets/d0rj3228/googleflan-t5-large
Explore at:
zip(23751646406 bytes)Available download formats
Dataset updated
Jul 14, 2023
Authors
d0rj_
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Info

Source repo is google/flan-t5-large.

Usage

Add dataset to Kaggle notebook;

Import pretrained from folder;

from transformers import AutoTokenizer, AutoModel model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large') tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
akensert_transform_PANDA_tiles
kaggle.com
zip
Updated Jul 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Cavallin (2020). akensert_transform_PANDA_tiles [Dataset]. https://www.kaggle.com/mawanda/akensert-transform-panda-tiles
Explore at:
zip(4677537450 bytes)Available download formats
Dataset updated
Jul 11, 2020
Authors
Giovanni Cavallin
Description
This dataset is inspired from the work of @akensert in retrieving the tiles from each image. I decided to upload my dataset version in order to train also in Kaggle notebook. I cropped the "level 1" of the original images, and each image comes with a dimension n_crops*256, 256, 3. Therefore, it is possible to retrieve the original image reshaping it to (-1, 256, 256, 3).
Reddit /r/datasets Dataset
kaggle.com
zip
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit /r/datasets Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-meta-corpus-of-datasets-the-reddit-dataset
Explore at:
zip(9619636 bytes)Available download formats
Dataset updated
Nov 28, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Meta-Corpus of Datasets: The Reddit Dataset

The Complete Collection of Datasets Posted on Reddit

By SocialGrep [source]

About this dataset

A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.

Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.

In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.

You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes

Research Ideas

Finding correlations between different types of datasets

Determining which datasets are most popular on Reddit

Analyzing the sentiments of post and comments on Reddit's /r/datasets board

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |

File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexis Cook (2020). test-notebook-upload [Dataset]. https://www.kaggle.com/alexisbcook/testnotebookupload

test-notebook-upload

Explore at:

zip(4320 bytes)Available download formats

Dataset updated

May 29, 2020

Authors

Alexis Cook

Description

Dataset

This dataset was created by Alexis Cook

Clear search

Close search

Google apps

Main menu

test-notebook-upload

Dataset

Contents

Farm Harmful Animals 2 Dataset

(🌅 Sunset) Kaggle Users' Country + Regions Info

[Context]

[Note]

[Limitations]

[How I defined active in this dataset]

[Update]

SAMSUNG FIRMWARE

videos_L22_a

Custom Yolov7 On Kaggle On Custom Dataset

Custom Training with YOLOv7 🔥

Some Important links

Contact Information

Objective

To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

Data Acquisition

Custom Training with YOLOv7 🔥

📦 YOLOv7

Step 1: Install Requirements

Downloading YOLOV7 starting checkpoint

Step 2: Assemble Our Dataset

Version v2 Aug 12, 2022 Looks like this.

Step 3: Training Custom pretrained YOLOv7 model

Eira1-A2I-Datset

Reddit: /r/pokemon

Reddit: /r/pokemon

Exploring Post Popularity, User Engagement and Topic Disscussion

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Huggingface RoBERTa

Processed dataset of smart meters in London

hm-pre-recommendation

Context

Content

Furniture Sales Data

AIC25 ZIP – media-info-aic25-b1.zip

AIC25 ZIP – clip-features-32-aic25-b1.zip

Pytorch Models

Amazon Web Scrapping Dataset

MIntRec2.0_RawData

google/flan-t5-large

Info

Usage

akensert_transform_PANDA_tiles

Reddit /r/datasets Dataset

The Meta-Corpus of Datasets: The Reddit Dataset

The Complete Collection of Datasets Posted on Reddit

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

test-notebook-upload

Dataset

Contents