Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Miracle Smith
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Krishna_raj@84
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This comprehensive collection serves as a valuable resource for data enthusiasts, researchers, and analysts seeking to explore a wide range of topics and uncover unique insights.
Context and Sources
Context: Our dataset is curated from user contributions on the Kaggle platform
Sources: https://www.kaggle.com/datasets?topic=musicDataset
| Column Name | Definition |
|---|---|
| Dataset Title | The title of the dataset |
| URL | The web address |
| Author | The individual or organization responsible for uploading the dataset. |
| Last Updated | The date when the dataset was last modified or updated. |
| Usability Score | An indicator of the dataset's quality, usefulness, and ease of use, as rated by Kaggle. |
| File Size | The size of the dataset file, helping users estimate the storage requirements. |
| Upvote Count | The number of upvotes received by the dataset, reflecting its popularity and relevance among users. |
| Medal Type | Kaggles Progression Type. |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Context
Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning
Column description - Dataset_name - Name of the dataset - Author_name - Name of the author - Author_id - Kaggle id of the author - No_of_files - Number of files the author has uploaded - size - Size of all the files - Type_of_file - Type of the files such as csv, json etc. - Upvotes - Total upvotes of the dataset - Medals - Medal of the dataset - Usability - Usability of the dataset - Date - Date in which the dataset is uploaded - Day - Day in which the dataset is uploaded - Time - Time in which the dataset is uploaded - Dataset_link - Kaggle link of the dataset
Acknowledgements The data has been scraped from the official Kaggle Website and is available under the Creative Common License.
Enjoy & Keep Learning !!!
Facebook
TwitterThis dataset was created by chilli_sawze
Facebook
TwitterThis dataset was created by Haidy Ashraf21
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.
Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggleβs community and activity.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">
Facebook
TwitterThis dataset was created by Ahmad Basher
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://github.com/dean-kg/RoadToExpertRanking_Kaggle/blob/main/kg_medal.png?raw=true" alt="kaggle_medal">
The Kaggle Dataset medal rule has a bronze medal when a user with a rank of novice or higher upvotes 5 or more, a silver medal when 20 or more upvotes, and a gold medal when 50 or more. Recently I uploaded a lot of datasets to Kaggle. However, although I have won many bronze medals, I have never won more than a silver medal. So, I created this dataset to check the characteristics of the dataset that will receive the silver medal. The metadata of the dataset that received at least one upvote among all Kaggle datasets and the number of MedalVoteCount in each dataset were recorded together.
This dataset can be used to create strategies for receiving silver and gold medals.
42,955 meta data of datasets from 2015-12 to 2021-11
https://www.kaggle.com/kaggle/meta-kaggle and get "MedalVoteCount" value by scraping
Facebook
TwitterCompanies and individuals are storing increasingly more data digitally; however, much of the data is unused because it is unclassified. How many times have you opened your downloads folder, found a file you downloaded a year ago and you have no idea what the contents are? You can read through those files individually but imagine doing that for thousands of files. All that raw data in storage facilities create data lakes. As the amount of data grows and the complexity rises, data lakes become data swamps. The potentially valuable and interesting datasets will likely remain unused. Our tool addresses the need to classify these large pools of data in a visually effective and succinct manner by identifying keywords in datasets, and classifying datasets into a consistent taxonomy.
The files listed within kaggleDatasetSummaryTopicsClassification.csv have been processed with our tool to generate the keywords and taxonomic classification as seen below. The summaries are not generated from our system. Instead they were retrieved from user input as they uploaded the files on Kaggle. We planned to utilize these summaries to create an NLG model to generate summaries from any input file. Unfortunately we were not able to collect enough data to build a good model. Hopefully the data within this set might help future users achieve that goal.
Developed with Senior Design Center at NC State in collaboration with SAS. Senior Design Team: Tanya Chu, Katherine Marsh, Nikhil Milind, Anna Owens SAS Representatives: : Nancy Rausch, Marty Warner, Brant Kay, Tyler Wendell, JP Trawinski
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13367141%2F444a868e669671faf9007822d6f2d348%2FAdd%20a%20heading.png?generation=1731775788329917&alt=media" alt="">
This dataset provides comprehensive metadata on various Kaggle datasets, offering detailed information about the dataset owners, creators, usage statistics, licensing, and more. It can help researchers, data scientists, and Kaggle enthusiasts quickly analyze the key attributes of different datasets on Kaggle. π
datasetUrl π: The URL of the Kaggle dataset page. This directs you to the specific dataset's page on Kaggle.
ownerAvatarUrl πΌοΈ: The URL of the dataset owner's profile avatar on Kaggle.
ownerName π€: The name of the dataset owner. This can be the individual or organization that created and maintains the dataset.
ownerUrl π: A link to the Kaggle profile page of the dataset owner.
ownerUserId πΌ: The unique user ID of the dataset owner on Kaggle.
ownerTier ποΈ: The ownership tier, such as "Tier 1" or "Tier 2," indicating the owner's status or level on Kaggle.
creatorName π©βπ»: The name of the dataset creator, which could be different from the owner.
creatorUrl π: A link to the Kaggle profile page of the dataset creator.
creatorUserId πΌ: The unique user ID of the dataset creator.
scriptCount π: The number of scripts (kernels) associated with this dataset.
scriptsUrl π: A link to the scripts (kernels) page for the dataset, where you can explore related code.
forumUrl π¬: The URL to the discussion forum for this dataset, where users can ask questions and share insights.
viewCount π: The number of views the dataset page has received on Kaggle.
downloadCount β¬οΈ: The number of times the dataset has been downloaded by users.
dateCreated π
: The date when the dataset was first created and uploaded to Kaggle.
dateUpdated π: The date when the dataset was last updated or modified.
voteButton π: The metadata for the dataset's vote button, showing how users interact with the dataset's quality ratings.
categories π·οΈ: The categories or tags associated with the dataset, helping users filter datasets based on topics of interest (e.g., "Healthcare," "Finance").
licenseName π‘οΈ: The name of the license under which the dataset is shared (e.g., "CC0," "MIT License").
licenseShortName π: A short form or abbreviation of the dataset's license name (e.g., "CC0" for Creative Commons Zero).
datasetSize π¦: The size of the dataset in terms of storage, typically measured in MB or GB.
commonFileTypes π: A list of common file types included in the dataset (e.g., .csv, .json, .xlsx).
downloadUrl β¬οΈ: A direct link to download the dataset files.
newKernelNotebookUrl π: A link to a new kernel or notebook related to this dataset, for those who wish to explore it programmatically.
newKernelScriptUrl π»: A link to a new script for running computations or processing data related to the dataset.
usabilityRating π: A rating or score representing how usable the dataset is, based on user feedback.
firestorePath π: A reference to the path in Firestore where this datasetβs metadata is stored.
datasetSlug π·οΈ: A URL-friendly version of the dataset name, typically used for URLs.
rank π: The dataset's rank based on certain metrics (e.g., downloads, votes, views).
datasource π: The source or origin of the dataset (e.g., government data, private organizations).
medalUrl π
: A URL pointing to the dataset's medal or badge, indicating the dataset's quality or relevance.
hasHashLink π: Indicates whether the dataset has a hash link for verifying data integrity.
ownerOrganizationId π’: The unique organization ID of the dataset's owner if the owner is an organization rather than an individual.
totalVotes π³οΈ: The total number of votes the dataset has received from users, reflecting its popularity or quality.
category_names π: A comma-separated string of category names that represent the datasetβs classification.
This dataset is a valuable resource for those who want to analyze Kaggle's ecosystem, discover high-quality datasets, and explore metadata in a structured way. ππ
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Kaggle is one of the largest communities of data scientists and machine learning practitioners in the world, and its platform hosts thousands of datasets covering a wide range of topics and industries. With so many options to choose from, it can be difficult to know where to start or what datasets are worth exploring. That's where this dataset comes in. By scraping information about the top 10,000 datasets on Kaggle, we have created a single source of truth for the most popular and useful datasets on the platform. This dataset is not just a list of names and numbers, but a valuable tool for data enthusiasts and professionals alike, providing insights into the latest trends and techniques in data science and machine learning
The data has been scraped from the official Kaggle Website and is available under the Creative Common License.
Keep Learning !!!
Facebook
TwitterThis dataset was created by vaibhav panvalkar
Facebook
TwitterThis dataset was created by kyle-cloud
Facebook
TwitterThis dataset was created by jinmuyan7
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By SocialGrep [source]
A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment
For more datasets, click here.
- π¨ Your notebook can be here! π¨!
In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.
Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.
In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.
You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes
- Finding correlations between different types of datasets
- Determining which datasets are most popular on Reddit
- Analyzing the sentiments of post and comments on Reddit's /r/datasets board
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |
File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Robin
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The idea came to my mind to scrap this data. I was working on an e-commerce project Fashion Product Recommendation (an end-to-end project). In this project, upload any fashion image and it will show the 10 closest recommendations.
https://user-images.githubusercontent.com/40932902/169657090-20d3342d-d472-48e3-bc34-8a9686b09961.png" alt="">
https://user-images.githubusercontent.com/40932902/169657035-870bb803-f985-482a-ac16-789d0fcf2a2b.png" alt="">
https://user-images.githubusercontent.com/40932902/169013855-099838d6-8612-45ce-8961-28ccf44f81f7.png" alt="">
I completed my project on this image dataset . The problem I faced while deploying on the Heroku server. Due to the large project file size, I was unable to deploy as Heroku offers limited memory space for a free account.
As currently, I am only familiar with Heroku. Learning AWS for big projects. So, I decided to scrap my own image dataset with much more information that can help me to transform this project to the next level. Scraped this data from flipkart.com(e-commerce website) in two formats Image and textual data in tabular format.
This dataset contains 65k images (400x450 pixel)) of fashion/style products and accessories like clothing, footwear, accessories, and many more. There is a CSV file also mapped with the image name and the id column in tabular data. The name of the image is in a unique numerical format like 1.png, 62299.png Image name and Id columns are the same. So, suppose you want to find the details of any image then you can find them using the image name id, go to the Id column in the csv file and that id rows will be the details of the image. You can find the notebook in the code section which I used to scrap this data.
Columns of CSV Dataset: 1. id : Unique id same as the image name 2. brand: Brand name of the product 3. title: Title of the product 4. sold_price: selling price of the product 5. actual_price: Actual price of the product 6. url : unique URL of every product 7. img: Image URL
How did helped me this dataset: 1. I trained my CNN model using the image data, that's the only use of the image dataset. 2. In my front-end page of the project to display results, I used Image URL and displayed after extracting from the web. This helped me to not upload the image dataset with the project on the server and this saved huge memory space. 3. Using the url displaying live price and** ratings** from the Flipkart website. 4. And there is a Buy button mapped with the url you will be redirected to the original product page and buy it from there. after using this dataset I changed my project name from Fashion Product Recommender to Flipkart Fashion Product Recommender. πππ
Still, the memory problem was not resolved as the model trained file was above 500MB on the complete dataset. So I tried on multiple sets and finally, I deployed after training on 1000 images only. In the future, I will try on another platform to deploy the complete project. I learned many new things while working on this dataset.
To download the same dataset in small size less than 500mb you can find it here, everything is the same as this dataset only I reduced the pixel of the image from 400x450px to ** 65x80pixels**.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The uploaded dataset contains detailed information about employees, training programs, and other HR-related metrics. Here's an overview:
General Details:
Rows: 3,150
Columns: 39
Column Names:
Unnamed: 0
FirstName
LastName
StartDate
ExitDate
Title
Supervisor
ADEmail
BusinessUnit
EmployeeStatus
EmployeeType
PayZone
EmployeeClassificationType
TerminationType
TerminationDescription
DepartmentType
Division
DOB
State
JobFunctionDescription
GenderCode
LocationCode
RaceDesc
MaritalDesc
Performance Score
Current Employee Rating
Employee ID
Survey Date
Engagement Score
Satisfaction Score
Work-Life Balance Score
Training Date
Training Program Name
Training Type
Training Outcome
Location
Trainer
Training Duration (Days)
Training Cost
Summary:
Employee Data: Contains details such as names, start and exit dates, job titles, and supervisors.
Performance and Survey Metrics: Includes engagement, satisfaction, and work-life balance scores.
Training Information: Covers program names, training types, outcomes, durations, costs, and trainer details.
Diversity Details: Includes gender, race, and marital status.
Status & Classification: Indicates employee status (active/terminated), type, and termination reasons.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data set contains info about the datasets trending on kaggle. This dataset has info like dataset author, dataset title, file size,number of files,uploading date, upvotes, medals and usability score.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Miracle Smith
Released under Database: Open Database, Contents: Database Contents