Facebook
TwitterThis dataset was created by Alexis Cook
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://www.kaggle.com/datasets/muzammilaliveltech/farm-harmful-animals-dataset
this dataset is not mine, it was uploaded to Kaggle by MUZAMMIL ALI VELTECH under CC0: Public Domain. This Roboflow project was made as an attempt to use the dataset after having issue trying to import in Jupyter Notebook from Kaggle
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The official Meta-Kaggle dataset contains the Users.csv file which contains Username, DisplayName, RegisterDate, and PerformanceTier fields but doesn't contain location data of the Kaggle Users. This dataset augments that data with additional country and region information.
I haven't included the username and displayname values on purpose, just the userid to be joined back to the Meta-Kaggle official Users.csv file.
It is possible that some users haven't inputted their details when the scraper went through their accounts and thus have missing data. Another possibility is that users may have updated their info after the scraper went through their accounts, thus resulting in inconsistencies.
Facebook
TwitterClean the data, Preprocess it and then make predictions. Try and Learn.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Mirror of external ZIP: https://aic-data.ledo.io.vn/Videos_L22_a.zip
Uploaded via Kaggle Notebook (one-file-per-dataset).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.
from IPython.display import Markdown, display
display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:
Image Credit - jinfagang
!git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
%cd yolov7
!pip install -qr requirements.txt
!pip install -q roboflow
!wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
import os
import glob
import wandb
import torch
from roboflow import Roboflow
from kaggle_secrets import UserSecretsClient
from IPython.display import Image, clear_output, display # to display images
print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">
I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!
try:
user_secrets = UserSecretsClient()
wandb_api_key = user_secrets.get_secret("wandb_api")
wandb.login(key=wandb_api_key)
anonymous = None
except:
wandb.login(anonymous='must')
print('To use your W&B account,
Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB.
Get your W&B access token from here: https://wandb.ai/authorize')
wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">
In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.
In Roboflow, We can choose between two paths:
https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">
user_secrets = UserSecretsClient()
roboflow_api_key = user_secrets.get_secret("roboflow_api")
rf = Roboflow(api_key=roboflow_api_key)
project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
dataset = project.version(2).download("yolov7")
Here, I am able to pass a number of arguments: - img: define input image size - batch: determine
Facebook
TwitterDataset Creation
The notebook uploaded here is used to convert the audio and image into tensor files which will be used by the training script to train the Image Decoder Module. The Dataset used was "https://www.kaggle.com/datasets/jorvan/image-audio-pairs-1-of-3". This dataset must be imported into the environment (Eg. Kaggle) and the path BASE_DATA_PATH must be updated in the notebook
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This Kaggle dataset provides a unique opportunity to explore the ongoing conversations and discussions of the popular Pokémon franchise across Reddit communities. It contains over a thousand entries compiled from posts and comments made by avid Pokémon fans, providing valuable insights into post popularity, user engagement, and topic discussion. With these comprehensive data points including post title, score, post ID link URL, number of comments and date & time created along with body text and timestamp – powerful analysis can be conducted to assess how trends in Pokémon-related activities are evolving over time. So why not dive deep into this fascinating world of Poké-interactions? Follow us as we navigate through the wide range of interesting topics being discussed on Reddit about this legendary franchise!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains over a thousand entries of user conversations related to the Pokémon community posted and commented on Reddit. By using this dataset, you can explore the popularity of Pokémon related topics, the level of user engagement, and how user interactions shape the discussion around each topic. To do so, you’ll want to focus on columns such as title, score, url, comms_num (number of comments on a post), created (date and time when post was created) and timestamp.
For starters you can start by looking at how many posts have been made about certain topics by using “title” column as a keyword search bar – e.g., ‘Magikarp’ or ‘Team Rocket’ – to see just how many posts have been about them in total. With this data in mind, you could consider what makes popular posts become popular and look at the number of upvotes from users (stored in “score”)– i.e., what posts caught people's attention? Beyond upvotes however is downvotes - can these be taken into account when it comes to gauging popularity? One could also take into consideration user engagement by looking at comms_num as it contains information regarding number of comments left for each post - does an increase in comments lead to an increase in upvotes?
Additionally one could examine how posts were communicated with users by reading into body texts stored under 'body'. Through this information users can create insights into overall discussion per topic: are they conversational or argumentative? Are there underlying regional trends taking place among commenters who place emphasis on different elements regarding their pokemon-related discussions?
This opens up possibilities for further investigations into understanding pokemon-related phenomena through Reddit discussion; finding out what makes certain topics prevalent while others stay obscure; seeing where our World Regions lay within certain conversations; all while understanding specific nuances within conversation trees between commenters!
- Analyzing the influence of post upvotes in user engagement and conversation outcomes
- Investigating the frequency of topics discussed in Pokémon related conversations
- Examining the correlation between post score and number of comments on each post
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: pokemon.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments the post has received. (Integer) | | created | The date and time the post was created. (DateTime) | | body | The body text of the post. (String) | | timestamp | The timestamp of the post. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
Facebook
TwitterThis dataset contains different variants of the RoBERTa and XLM-RoBERTa model by Meta AI available on Hugging Face's model repository.
By making it a dataset, it is significantly faster to load the weights since you can directly attach a Kaggle dataset to the notebook rather than downloading the data every time. See the speed comparison notebook. Another benefit of loading models as a dataset is that it can be used in competitions that require internet access to be "off".
For more information on usage visit the roberta hugging face docs and the xlm-roberta hugging face docs.
Usage
To use this dataset, attach it to your notebook and specify the path to the dataset. For example:
from transformers import AutoTokenizer, AutoModelForPreTraining
MODEL_DIR = "/kaggle/input/huggingface-roberta/"
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR + "roberta-base")
model = AutoModelForPreTraining.from_pretrained(MODEL_DIR + "roberta-base")
Acknowledgements All the copyrights and IP relating to RoBERTa and XLM-RoBERTa belong to the original authors (Liu et al. and Conneau et al.) and Meta AI. All copyrights relating to the transformers library belong to Hugging Face. Please reach out directly to the authors if you have questions regarding licenses and usage.
Facebook
TwitterThis dataset has been processed from the Smart meters in London available on Kaggle. The original dataset contains information on households, and the meter readings can be aggregated at various levels - household, blocks, and the entire city. However, due to long processing times and high memory usage during aggregation, this dataset provides processed results that can be directly used for analysis or modeling.
To explore and model the dataset, an example notebook is available at the Github link.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
But you need to download other notebooks result, then upload it if you want to use within your notebook. So i create this dataset for anyone who want to use directly notebook result without download/upload. Please upvote if it help you
This dataset contain 5 results as input using for a hybrid approach in this notebook: * https://www.kaggle.com/titericz/h-m-ensembling-how-to/notebook. * https://www.kaggle.com/code/atulverma/h-m-ensembling-with-lstm
If you want to use this notebook but can't access to private dataset, please add my dataset to your notebook, than change file path.
It has 5 files:
* submissio_byfone_chris.csv: Submission result from: https://www.kaggle.com/lichtlab/0-0226-byfone-chris-combination-approach
* submission_exponential_decay.csv: Submission result from: https://www.kaggle.com/tarique7/hnm-exponential-decay-with-alternate-items/notebook
* submission_trending.csv: Submission result from: https://www.kaggle.com/lunapandachan/h-m-trending-products-weekly-add-test/notebook
* submission_sequential_model.csv: Submission result from: https://www.kaggle.com/code/astrung/sequential-model-fixed-missing-last-item/notebook
* submission_sequential_with_item_feature.csv: Submission result from: https://www.kaggle.com/code/astrung/lstm-model-with-item-infor-fix-missing-last-item/notebook
Facebook
TwitterThis dataset is generated for the purpose of analyzing furniture sales data using multiple regression techniques. It contains 2,500 rows with 15 columns, including 7 numerical columns and 7 categorical columns, along with a target variable (revenue) which represents the total revenue generated from furniture sales. The dataset captures various aspects of furniture sales, such as pricing, cost, sales volume, discount percentage, inventory levels, delivery time, and different categorical attributes like furniture type, material, color, and store location.
Guys please upload your notebook of this dataset so that others can also learn from your work
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Mirror of external ZIP: https://aic-data.ledo.io.vn/media-info-aic25-b1.zip
Uploaded via Kaggle Notebook (one-file-per-dataset).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Mirror of external ZIP: https://aic-data.ledo.io.vn/clip-features-32-aic25-b1.zip
Uploaded via Kaggle Notebook (one-file-per-dataset).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
✅ Step 1: Mount to Dataset
Search for my dataset pytorch-models and add it — this will mount it at:
/kaggle/input/pytorch-models/
✅ Step 2: Check file paths Once mounted, the four files will be available at:
/kaggle/input/pytorch-models/base_models.py
/kaggle/input/pytorch-models/ext_base_models.py
/kaggle/input/pytorch-models/ext_hybrid_models.py
/kaggle/input/pytorch-models/hybrid_models.py
✅ Step 3: Copy files to working directory To make them importable, copy the .py files to your notebook’s working directory (/kaggle/working/):
import shutil
shutil.copy('/kaggle/input/pytorch-models/base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_base_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/ext_hybrid_models.py', '/kaggle/working/')
shutil.copy('/kaggle/input/pytorch-models/hybrid_models.py', '/kaggle/working/')
✅ Step 4: Import your modules Now that they are in the working directory, you can import them like normal:
import base_models
import ext_base_models
import ext_hybrid_models
import hybrid_models
Or, if you only want to import specific classes or functions:
from base_models import YourModelClass
from ext_base_models import AnotherModelClass
✅ Step 5: Use the models You can now initialize and use the models/classes/functions defined inside each file:
model = base_models.YourModelClass()
output = model(input_data)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Amazon Scrapping Dataset; 1. Import libraries 2. Connect to the website 3. Import CSV and datetime 4. Import pandas 5. Appending dataset to CSV 6. Automation Dataset updated 7. Timers setup 8. Email notification
Facebook
TwitterMIntRec stands for multimodal intent recognition. The benchmark dataset is first introduced at ACM MM 2022: https://dl.acm.org/doi/10.1145/3503161.3547906. The uploaded version is a latest second version which is introduced in ICLR 2024: https://openreview.net/forum?id=nY9nITZQjc.
More details can be found here: https://github.com/thuiar/MIntRec2.0 , and among the 3 versions, I uploaded the feature and raw data(this one's raw data) because they seem to be the best way to practice on the Kaggle Notebook Environment.
The purpose of uploading this dataset is to practice multi-classification tasks using LLMs whether the large models can recognize and classify human intent properly.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Source repo is google/flan-t5-large.
from transformers import AutoTokenizer, AutoModel
model = AutoModel.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
tokenizer = AutoTokenizer.from_pretrained('/kaggle/input/googleflan-t5-large/flan-t5-large')
Facebook
TwitterThis dataset is inspired from the work of @akensert in retrieving the tiles from each image. I decided to upload my dataset version in order to train also in Kaggle notebook. I cropped the "level 1" of the original images, and each image comes with a dimension n_crops*256, 256, 3. Therefore, it is possible to retrieve the original image reshaping it to (-1, 256, 256, 3).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By SocialGrep [source]
A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.
Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.
In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.
You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes
- Finding correlations between different types of datasets
- Determining which datasets are most popular on Reddit
- Analyzing the sentiments of post and comments on Reddit's /r/datasets board
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |
File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
Facebook
TwitterThis dataset was created by Alexis Cook