100+ datasets found

Meta Kaggle
kaggle.com
zip
Updated Mar 7, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle
Explore at:
zip(10349076623 bytes)Available download formats
Dataset updated
Mar 7, 2026
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.
Predictive Maintenance Dataset
kaggle.com
zip
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
zip(1798425 bytes)Available download formats
Dataset updated
Nov 7, 2022
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
Online Courses
kaggle.com
zip
Updated Jun 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khaled Shata (2023). Online Courses [Dataset]. https://www.kaggle.com/datasets/khaledatef1/online-courses
Explore at:
zip(1314629 bytes)Available download formats
Dataset updated
Jun 28, 2023
Authors
Khaled Shata
Description
The dataset contains information on around 10,000 online courses from popular online learning platforms as : Coursera, Udacity, Simplilearn, and FutureLearn. The data was scraped and compiled, with the dataset being updated until the year 2023. This dataset provides valuable information for analyzing and understanding the online learning landscape as of that year.

The dataset is typically available in a structured format, such as a CSV (Comma-Separated Values) file or a spreadsheet, with each row representing a course and each column representing a specific attribute or feature of the course.

Potential Applications:

1- Course Recommendations: Analyzing the dataset can provide insights for recommending courses to individuals based on their interests, skill level, and career goals.

2- Market Analysis: Researchers or analysts can use the dataset to study the market share and popularity of different online learning platforms and subject areas.

3- Skill Demand Analysis: The dataset can help identify the most in-demand skills and subject areas among online learners.

4- Educational Research: Researchers can leverage the dataset to investigate trends and patterns in online learning, instructional design, and course delivery.
NYC Open Data
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NYC Open Data (2019). NYC Open Data [Dataset]. https://www.kaggle.com/datasets/nycopendata/new-york
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
NYC Open Data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

NYC Open Data is an opportunity to engage New Yorkers in the information that is produced and used by City government. We believe that every New Yorker can benefit from Open Data, and Open Data can benefit from every New Yorker. Source: https://opendata.cityofnewyork.us/overview/

Content

Thanks to NYC Open Data, which makes public data generated by city agencies available for public use, and Citi Bike, we've incorporated over 150 GB of data in 5 open datasets into Google BigQuery Public Datasets, including:

Over 8 million 311 service requests from 2012-2016

More than 1 million motor vehicle collisions 2012-present

Citi Bike stations and 30 million Citi Bike trips 2013-present

Over 1 billion Yellow and Green Taxi rides from 2009-present

Over 500,000 sidewalk trees surveyed decennially in 1995, 2005, and 2015

This dataset is deprecated and not being updated.

Fork this kernel to get started with this dataset.

Acknowledgements

https://opendata.cityofnewyork.us/

https://cloud.google.com/blog/big-data/2017/01/new-york-city-public-datasets-now-available-on-google-bigquery

This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - https://data.cityofnewyork.us/ - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

By accessing datasets and feeds available through NYC Open Data, the user agrees to all of the Terms of Use of NYC.gov as well as the Privacy Policy for NYC.gov. The user also agrees to any additional terms of use defined by the agencies, bureaus, and offices providing data. Public data sets made available on NYC Open Data are provided for informational purposes. The City does not warranty the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set made available on NYC Open Data, nor are any such warranties to be implied or inferred with respect to the public data sets furnished therein.

The City is not liable for any deficiencies in the completeness, accuracy, content, or fitness for any particular purpose or use of any public data set, or application utilizing such data set, provided by any third party.

Banner Photo by @bicadmedia from Unplash.

Inspiration

On which New York City streets are you most likely to find a loud party?

Can you find the Virginia Pines in New York City?

Where was the only collision caused by an animal that injured a cyclist?

What’s the Citi Bike record for the Longest Distance in the Shortest Time (on a route with at least 100 rides)?

https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png" alt="enter image description here"> https://cloud.google.com/blog/big-data/2017/01/images/148467900588042/nyc-dataset-6.png
COVID-19 Dataset
kaggle.com
zip
Updated Nov 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meir Nizri (2022). COVID-19 Dataset [Dataset]. https://www.kaggle.com/datasets/meirnizri/covid19-dataset
Explore at:
zip(4890659 bytes)Available download formats
Dataset updated
Nov 13, 2022
Authors
Meir Nizri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.

The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.

content

The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.

sex: 1 for female and 2 for male.

age: of the patient.

classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.

patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.

pneumonia: whether the patient already have air sacs inflammation or not.

pregnancy: whether the patient is pregnant or not.

diabetes: whether the patient has diabetes or not.

copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.

asthma: whether the patient has asthma or not.

inmsupr: whether the patient is immunosuppressed or not.

hypertension: whether the patient has hypertension or not.

cardiovascular: whether the patient has heart or blood vessels related disease.

renal chronic: whether the patient has chronic renal disease or not.

other disease: whether the patient has other disease or not.

obesity: whether the patient is obese or not.

tobacco: whether the patient is a tobacco user.

usmr: Indicates whether the patient treated medical units of the first, second or third level.

medical unit: type of institution of the National Health System that provided the care.

intubed: whether the patient was connected to the ventilator.

icu: Indicates whether the patient had been admitted to an Intensive Care Unit.

date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.
🚨 Fake Reviews Dataset
kaggle.com
zip
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2023). 🚨 Fake Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/fake-reviews-dataset
Explore at:
zip(5016888 bytes)Available download formats
Dataset updated
Sep 17, 2023
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The generated fake reviews dataset, containing 20k fake reviews and 20k real product reviews. OR = Original reviews (presumably human created and authentic); CG = Computer-generated fake reviews.

Citation

Salminen, J., Kandpal, C., Kamel, A. M., Jung, S., & Jansen, B. J. (2022). Creating and detecting fake reviews of online products. Journal of Retailing and Consumer Services, 64, 102771. https://doi.org/10.1016/j.jretconser.2021.102771

Acknowlegement

Foto von Brett Jordan auf Unsplash
Detailed Products Datasets
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sujay Kapadnis (2023). Detailed Products Datasets [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/products-datasets
Explore at:
zip(102115 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
Sujay Kapadnis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of products with the attributes

S.No

BrandName

Product ID

Product Name

Brand Desc

Product Size

Currency

MRP

SellPrice

Discount

Category

Kari, Venkatram (2023), “Product Dataset”, Mendeley Data, V1, doi: 10.17632/v8yt3r8th2.1
MSRVTT
kaggle.com
opendatalab.com
zip
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishnutheep B (2022). MSRVTT [Dataset]. https://www.kaggle.com/datasets/vishnutheepb/msrvtt
Explore at:
zip(4574604594 bytes)Available download formats
Dataset updated
Nov 7, 2022
Authors
Vishnutheep B
Description
MSR-VTT (Microsoft Research Video to Text) is a large-scale dataset for the open domain video captioning, which consists of 10,000 video clips from 20 categories, and each video clip is annotated with 20 English sentences by Amazon Mechanical Turks. There are about 29,000 unique words in all captions. The standard splits uses 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.
Mental Health Dataset
kaggle.com
zip
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). Mental Health Dataset [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/mental-health-dataset
Explore at:
zip(2048887 bytes)Available download formats
Dataset updated
Mar 18, 2024
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.

Benefits of using this dataset:

Insight into Mental Health: The dataset provides valuable insights into mental health by analyzing linguistic patterns, sentiment, and psychological indicators in text data. Researchers and data scientists can gain a better understanding of how mental health issues manifest in online communication.

Predictive Modeling: With a wide range of features, including sentiment analysis scores and psychological indicators, the dataset offers opportunities for developing predictive models to identify or predict mental health outcomes based on textual data. This can be useful for early intervention and support.

Community Engagement: Mental health is a topic of increasing importance, and this dataset can foster community engagement on platforms like Kaggle. Data enthusiasts, researchers, and mental health professionals can collaborate to analyze the data and develop solutions to address mental health challenges.

Data-driven Insights: By analyzing the dataset, users can uncover correlations and patterns between linguistic features, sentiment, and mental health indicators. These insights can inform interventions, policies, and support systems aimed at promoting mental well-being.

Educational Resource: The dataset can serve as a valuable educational resource for teaching and learning about mental health analytics, sentiment analysis, and text mining techniques. It provides a real-world dataset for students and practitioners to apply data science skills in a meaningful context.
Student Mental health
kaggle.com
zip
Updated Feb 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD Shariful Islam (2023). Student Mental health [Dataset]. https://www.kaggle.com/datasets/shariful07/student-mental-health
Explore at:
zip(1664 bytes)Available download formats
Dataset updated
Feb 17, 2023
Authors
MD Shariful Islam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
A STATISTICAL RESEARCH ON THE EFFECTS OF MENTAL HEALTH ON STUDENTS’ CGPA dataset This Data set was collected by a survey conducted by Google forms from University student in order to examine their current academic situation and mental health.

All the data was based on Malaysia and collected from Iium (International Islamic University Malaysia).

Loan Approval Classification Dataset

kaggle.com

zip

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ta-wei Lo (2024). Loan Approval Classification Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data

Explore at:

zip(768769 bytes)Available download formats

Dataset updated

Oct 29, 2024

Authors

Ta-wei Lo

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Data Source

This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

2. Metadata

The dataset contains 45,000 records and 14 variables, each described below:

Column	Description	Type
`person_age`	Age of the person	Float
`person_gender`	Gender of the person	Categorical
`person_education`	Highest education level	Categorical
`person_income`	Annual income	Float
`person_emp_exp`	Years of employment experience	Integer
`person_home_ownership`	Home ownership status (e.g., rent, own, mortgage)	Categorical
`loan_amnt`	Loan amount requested	Float
`loan_intent`	Purpose of the loan	Categorical
`loan_int_rate`	Loan interest rate	Float
`loan_percent_income`	Loan amount as a percentage of annual income	Float
`cb_person_cred_hist_length`	Length of credit history in years	Float
`credit_score`	Credit score of the person	Integer
`previous_loan_defaults_on_file`	Indicator of previous loan defaults	Categorical
`loan_status` (target variable)	Loan approval status: 1 = approved; 0 = rejected	Integer

3. Data Usage

The dataset can be used for multiple purposes:

Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

Mind the data issue from the original data, such as the instance > 100-year-old as age.

This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

🖼️ Famous Paintings
kaggle.com
zip
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2023). 🖼️ Famous Paintings [Dataset]. https://www.kaggle.com/datasets/mexwell/famous-paintings
Explore at:
zip(6681482 bytes)Available download formats
Dataset updated
Oct 5, 2023
Authors
mexwell
Description
Famous paintings and their artists. This data set is published to help students have interesting data to practice SQL

Original Data

Acknowlegement

Foto von Steve Johnson auf Unsplash
Kaggle Dataset Metadata Repository
kaggle.com
zip
Updated Nov 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ijaj Ahmed (2024). Kaggle Dataset Metadata Repository [Dataset]. https://www.kaggle.com/datasets/ijajdatanerd/kaggle-dataset-metadata-repository
Explore at:
zip(5122110 bytes)Available download formats
Dataset updated
Nov 16, 2024
Authors
Ijaj Ahmed
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13367141%2F444a868e669671faf9007822d6f2d348%2FAdd%20a%20heading.png?generation=1731775788329917&alt=media" alt="">

Kaggle Dataset Metadata Collection 📊

This dataset provides comprehensive metadata on various Kaggle datasets, offering detailed information about the dataset owners, creators, usage statistics, licensing, and more. It can help researchers, data scientists, and Kaggle enthusiasts quickly analyze the key attributes of different datasets on Kaggle. 📚

Dataset Overview:

Purpose: To provide detailed insights into Kaggle dataset metadata.

Content: Information related to the dataset's owner, creator, usage metrics, licensing, and more.

Target Audience: Data scientists, Kaggle competitors, and dataset curators.

Columns Description 📋

datasetUrl 🌐: The URL of the Kaggle dataset page. This directs you to the specific dataset's page on Kaggle.

ownerAvatarUrl 🖼️: The URL of the dataset owner's profile avatar on Kaggle.

ownerName 👤: The name of the dataset owner. This can be the individual or organization that created and maintains the dataset.

ownerUrl 🌍: A link to the Kaggle profile page of the dataset owner.

ownerUserId 💼: The unique user ID of the dataset owner on Kaggle.

ownerTier 🎖️: The ownership tier, such as "Tier 1" or "Tier 2," indicating the owner's status or level on Kaggle.

creatorName 👩‍💻: The name of the dataset creator, which could be different from the owner.

creatorUrl 🌍: A link to the Kaggle profile page of the dataset creator.

creatorUserId 💼: The unique user ID of the dataset creator.

scriptCount 📜: The number of scripts (kernels) associated with this dataset.

scriptsUrl 🔗: A link to the scripts (kernels) page for the dataset, where you can explore related code.

forumUrl 💬: The URL to the discussion forum for this dataset, where users can ask questions and share insights.

viewCount 👀: The number of views the dataset page has received on Kaggle.

downloadCount ⬇️: The number of times the dataset has been downloaded by users.

dateCreated 📅: The date when the dataset was first created and uploaded to Kaggle.

dateUpdated 🔄: The date when the dataset was last updated or modified.

voteButton 👍: The metadata for the dataset's vote button, showing how users interact with the dataset's quality ratings.

categories 🏷️: The categories or tags associated with the dataset, helping users filter datasets based on topics of interest (e.g., "Healthcare," "Finance").

licenseName 🛡️: The name of the license under which the dataset is shared (e.g., "CC0," "MIT License").

licenseShortName 🔑: A short form or abbreviation of the dataset's license name (e.g., "CC0" for Creative Commons Zero).

datasetSize 📦: The size of the dataset in terms of storage, typically measured in MB or GB.

commonFileTypes 📂: A list of common file types included in the dataset (e.g., .csv, .json, .xlsx).

downloadUrl ⬇️: A direct link to download the dataset files.

newKernelNotebookUrl 📝: A link to a new kernel or notebook related to this dataset, for those who wish to explore it programmatically.

newKernelScriptUrl 💻: A link to a new script for running computations or processing data related to the dataset.

usabilityRating 🌟: A rating or score representing how usable the dataset is, based on user feedback.

firestorePath 🔍: A reference to the path in Firestore where this dataset’s metadata is stored.

datasetSlug 🏷️: A URL-friendly version of the dataset name, typically used for URLs.

rank 📈: The dataset's rank based on certain metrics (e.g., downloads, votes, views).

datasource 🌐: The source or origin of the dataset (e.g., government data, private organizations).

medalUrl 🏅: A URL pointing to the dataset's medal or badge, indicating the dataset's quality or relevance.

hasHashLink 🔗: Indicates whether the dataset has a hash link for verifying data integrity.

ownerOrganizationId 🏢: The unique organization ID of the dataset's owner if the owner is an organization rather than an individual.

totalVotes 🗳️: The total number of votes the dataset has received from users, reflecting its popularity or quality.

category_names 📑: A comma-separated string of category names that represent the dataset’s classification.

This dataset is a valuable resource for those who want to analyze Kaggle's ecosystem, discover high-quality datasets, and explore metadata in a structured way. 🌍📊
Ecommerce Text Classification
kaggle.com
zip
Updated Oct 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saurabh Shahane (2023). Ecommerce Text Classification [Dataset]. https://www.kaggle.com/datasets/saurabhshahane/ecommerce-text-classification
Explore at:
zip(8236809 bytes)Available download formats
Dataset updated
Oct 9, 2023
Authors
Saurabh Shahane
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the classification based E-commerce text dataset for 4 categories - "Electronics", "Household", "Books" and "Clothing & Accessories", which almost cover 80% of any E-commerce website.

The dataset is in ".csv" format with two columns - the first column is the class name and the second one is the datapoint of that class. The data point is the product and description from the e-commerce website.

The dataset has the following features :

Data Set Characteristics: Multivariate

Number of Instances: 50425

Number of classes: 4

Area: Computer science

Attribute Characteristics: Real

Number of Attributes: 1

Associated Tasks: Classification

Missing Values? No

Gautam. (2019). E commerce text dataset (version - 2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3355823
Kaggle Dataset Medals
kaggle.com
zip
Updated Dec 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niek van der Zwaag (2021). Kaggle Dataset Medals [Dataset]. https://www.kaggle.com/datasets/niekvanderzwaag/kaggle-dataset-medals
Explore at:
zip(4426597 bytes)Available download formats
Dataset updated
Dec 19, 2021
Authors
Niek van der Zwaag
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://github.com/dean-kg/RoadToExpertRanking_Kaggle/blob/main/kg_medal.png?raw=true" alt="medals">

Dataset Medals https://www.kaggle.com/static/images/medals/notebooks/goldl@2x.png" alt="gold">

Dataset Medals are awarded to popular public datasets published to the site, as measured by number of upvotes. Not all upvotes count towards medals: votes by novices are excluded from medal calculation.

Content https://www.kaggle.com/static/images/medals/datasets/silverl@2x.png" alt="silver">

Metadata of 42,955 datasets on Kaggle from 2015-12 to 2021-11

Medal: color of received medal

Created: time of creation

URL: URL to dataset on kaggle.com

Views: total view count

Votes: total vote count

Votes_Advanced: total vote count excluding votes from 'Novice' rank

Downloads: total download count

Kernels: total kernel count

Title: title of dataset

Description: description of dataset

Tags: tags of dataset

License: licence under which dataset is published

Acknowledgements https://www.kaggle.com/static/images/medals/notebooks/bronzel@2x.png" alt="bronze">

Tidied up version of dataset provided by @kukuroo3

Source: https://www.kaggle.com/kukuroo3/dataset-of-kaggle-dataset-include-medalvotecount
IT_incident_log_Dataset
kaggle.com
zip
Updated Jul 4, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shamiul islam shifat (2020). IT_incident_log_Dataset [Dataset]. https://www.kaggle.com/datasets/shamiulislamshifat/it-incident-log-dataset
Explore at:
zip(2571433 bytes)Available download formats
Dataset updated
Jul 4, 2020
Authors
shamiul islam shifat
Description
Data Set Information:

This is an event log of an incident management process extracted from data gathered from the audit system of an instance of the ServiceNowTM platform used by an IT company. The event log is enriched with data loaded from a relational database underlying a corresponding process-aware information system. Information was anonymized for privacy.

Number of instances: 141,712 events (24,918 incidents) Number of attributes: 36 attributes (1 case identifier, 1 state identifier, 32 descriptive attributes, 2 dependent variables)

Attribute Information:

number: incident identifier (24,918 different values);

incident state: eight levels controlling the incident management process transitions from opening until closing the case;

active: boolean attribute that shows whether the record is active or closed/canceled;

reassignment_count: number of times the incident has the group or the support analysts changed;

reopen_count: number of times the incident resolution was rejected by the caller;

sys_mod_count: number of incident updates until that moment;

made_sla: boolean attribute that shows whether the incident exceeded the target SLA;

caller_id: identifier of the user affected;

opened_by: identifier of the user who reported the incident;

opened_at: incident user opening date and time;

sys_created_by: identifier of the user who registered the incident;

sys_created_at: incident system creation date and time;

sys_updated_by: identifier of the user who updated the incident and generated the current log record;

sys_updated_at: incident system update date and time;

contact_type: categorical attribute that shows by what means the incident was reported;

location: identifier of the location of the place affected;

category: first-level description of the affected service;

subcategory: second-level description of the affected service (related to the first level description, i.e., to category);

u_symptom: description of the user perception about service availability;

cmdb_ci: (confirmation item) identifier used to report the affected item (not mandatory);

impact: description of the impact caused by the incident (values: 1â€“High; 2â€“Medium; 3â€“Low);

urgency: description of the urgency informed by the user for the incident resolution (values: 1â€“High; 2â€“Medium; 3â€“Low);

priority: calculated by the system based on 'impact' and 'urgency';

assignment_group: identifier of the support group in charge of the incident;

assigned_to: identifier of the user in charge of the incident;

knowledge: boolean attribute that shows whether a knowledge base document was used to resolve the incident;

u_priority_confirmation: boolean attribute that shows whether the priority field has been double-checked;

notify: categorical attribute that shows whether notifications were generated for the incident;

problem_id: identifier of the problem associated with the incident;

rfc: (request for change) identifier of the change request associated with the incident;

vendor: identifier of the vendor in charge of the incident;

caused_by: identifier of the RFC responsible by the incident;

close_code: identifier of the resolution of the incident;

resolved_by: identifier of the user who resolved the incident;

resolved_at: incident user resolution date and time (dependent variable);

closed_at: incident user close date and time (dependent variable).

Relevant Papers:

Amaral, C. A. L., Fantinato, M., Reijers, H. A., Peres, S. M., Enhancing Completion Time Prediction Through Attribute Selection. Proceedings of the 15th International Conference on Advanced Information Technologies for Management (AITM 2018) and 13th International Conference on Information Systems Management (ISM 2018), Revised Selected Papers â€“ Lecture Notes in Business Information Processing, v. 346, pp. 3-23, 2019. [Web Link]

Amaral, C. A. L., Fantinato, M., Peres, S. M., Attribute Selection with Filter and Wrapper: An Application on Incident Management Process. Proceedings of the 14th Federated Conference on Computer Science and Information Systems (FedCSIS 2018), pp. 679-682, 2018. [Web Link]

Maita, A. R. C., Martins, L. C., Paz, C. R. L., Rafferty, L., Hung, P., Peres, S. M., Fantinato, M. A systematic mapping study of process mining. Enterprise Information Systems, v. 12, n. 5, pp. 505-549, 2018. [Web Link]

Citation Request:

Please cite this paper if you use this dataset: Amaral, C. A. L., Fantinato, M., Reijers, H. A., Peres, S. M., Enhancing Completion Time Prediction Through Attribute Selection. Proceedings of the 15th International Conference on Advanced Information Technologies for Management (AITM 2018) and 13th International Conference on Information Systems Management (ISM 2018), Revised Selected Papers â€“ Lecture Notes in Business Information Processing, v. 346, pp. 3-23, 2019. [Web Link]
Stellar Classification Dataset - SDSS17
kaggle.com
zip
Updated Jan 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fedesoriano (2022). Stellar Classification Dataset - SDSS17 [Dataset]. https://www.kaggle.com/datasets/fedesoriano/stellar-classification-dataset-sdss17
Explore at:
zip(7223444 bytes)Available download formats
Dataset updated
Jan 15, 2022
Authors
fedesoriano
Description
Similar Datasets

CERN Proton Collision Dataset: LINK

Airfoil Self-Noise Dataset: LINK

CERN Electron Collision Data: LINK

Wind Speed Prediction Dataset: LINK

Spanish Wine Quality Dataset: LINK

Context

In astronomy, stellar classification is the classification of stars based on their spectral characteristics. The classification scheme of galaxies, quasars, and stars is one of the most fundamental in astronomy. The early cataloguing of stars and their distribution in the sky has led to the understanding that they make up our own galaxy and, following the distinction that Andromeda was a separate galaxy to our own, numerous galaxies began to be surveyed as more powerful telescopes were built. This datasat aims to classificate stars, galaxies, and quasars based on their spectral characteristics.

Content

The data consists of 100,000 observations of space taken by the SDSS (Sloan Digital Sky Survey). Every observation is described by 17 feature columns and 1 class column which identifies it to be either a star, galaxy or quasar. 1. obj_ID = Object Identifier, the unique value that identifies the object in the image catalog used by the CAS 1. alpha = Right Ascension angle (at J2000 epoch) 1. delta = Declination angle (at J2000 epoch) 1. u = Ultraviolet filter in the photometric system 1. g = Green filter in the photometric system 1. r = Red filter in the photometric system 1. i = Near Infrared filter in the photometric system 1. z = Infrared filter in the photometric system 1. run_ID = Run Number used to identify the specific scan 1. rereun_ID = Rerun Number to specify how the image was processed 1. cam_col = Camera column to identify the scanline within the run 1. field_ID = Field number to identify each field 1. spec_obj_ID = Unique ID used for optical spectroscopic objects (this means that 2 different observations with the same spec_obj_ID must share the output class) 1. class = object class (galaxy, star or quasar object) 1. redshift = redshift value based on the increase in wavelength 1. plate = plate ID, identifies each plate in SDSS 1. MJD = Modified Julian Date, used to indicate when a given piece of SDSS data was taken 1. fiber_ID = fiber ID that identifies the fiber that pointed the light at the focal plane in each observation

Citation

fedesoriano. (January 2022). Stellar Classification Dataset - SDSS17. Retrieved [Date Retrieved] from https://www.kaggle.com/fedesoriano/stellar-classification-dataset-sdss17.

Acknowledgements

The data released by the SDSS is under public domain. Its taken from the current data release RD17. - More information about the license: http://www.sdss.org/science/image-gallery/

SDSS Publications: - Abdurro’uf et al., The Seventeenth data release of the Sloan Digital Sky Surveys: Complete Release of MaNGA, MaStar and APOGEE-2 DATA (Abdurro’uf et al. submitted to ApJS) [arXiv:2112.02026]
Structural Protein Sequences
kaggle.com
zip
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHAHIR (2018). Structural Protein Sequences [Dataset]. https://www.kaggle.com/datasets/shahir/protein-data-set
Explore at:
zip(28782775 bytes)Available download formats
Dataset updated
Feb 3, 2018
Authors
SHAHIR
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

This is a protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).

The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.

Content

There are two data files. Both are arranged on "structureId" of the protein:

pdb_data_no_dups.csv contains protein meta data which includes details on protein classification, extraction methods, etc.

data_seq.csv contains >400,000 protein structure sequences.

Acknowledgements

Original data set down loaded from http://www.rcsb.org/pdb/

Inspiration

Protein data base helped the life science community to study about different diseases and come with new drugs and solution that help the human survival.
Framingham heart study dataset
kaggle.com
zip
Updated Apr 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ashish Bhardwaj (2022). Framingham heart study dataset [Dataset]. https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset
Explore at:
zip(59440 bytes)Available download formats
Dataset updated
Apr 19, 2022
Authors
Ashish Bhardwaj
Area covered
Framingham
Description
The "Framingham" heart disease dataset includes over 4,240 records,16 columns and 15 attributes. The goal of the dataset is to predict whether the patient has 10-year risk of future (CHD) coronary heart disease
Financial_Risk
kaggle.com
zip
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Preetham Gouda (2024). Financial_Risk [Dataset]. https://www.kaggle.com/datasets/preethamgouda/financial-risk
Explore at:
zip(709463 bytes)Available download formats
Dataset updated
Jul 23, 2024
Authors
Preetham Gouda
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Financial Risk Assessment Dataset provides detailed information on individual financial profiles. It includes demographic, financial, and behavioral data to assess financial risk. The dataset features various columns such as income, credit score, and risk rating, with intentional imbalances and missing values to simulate real-world scenarios.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2026). Meta Kaggle [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle

Meta Kaggle

Kaggle's public data on competitions, users, submission scores, code, and more

Explore at:

22 scholarly articles cite this dataset (View in Google Scholar)

zip(10349076623 bytes)Available download formats

Dataset updated

Mar 7, 2026

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.

https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">

This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.

Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.

August 2023 update

In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code

We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.

Clear search

Close search

Google apps

Main menu

Meta Kaggle

Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

August 2023 update

Predictive Maintenance Dataset

Online Courses

NYC Open Data

Context

Content

Acknowledgements

Inspiration

COVID-19 Dataset

Context

content

🚨 Fake Reviews Dataset

Citation

Acknowlegement

Detailed Products Datasets

MSRVTT

Mental Health Dataset

Benefits of using this dataset:

Student Mental health

Loan Approval Classification Dataset

1. Data Source

2. Metadata

3. Data Usage

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

🖼️ Famous Paintings

Acknowlegement

Kaggle Dataset Metadata Repository

Kaggle Dataset Metadata Collection 📊

Dataset Overview:

Columns Description 📋

Ecommerce Text Classification

Kaggle Dataset Medals

Dataset Medals https://www.kaggle.com/static/images/medals/notebooks/goldl@2x.png" alt="gold">

Content https://www.kaggle.com/static/images/medals/datasets/silverl@2x.png" alt="silver">

Acknowledgements https://www.kaggle.com/static/images/medals/notebooks/bronzel@2x.png" alt="bronze">

IT_incident_log_Dataset

Stellar Classification Dataset - SDSS17

Similar Datasets

Context

Content

Citation

Acknowledgements

Structural Protein Sequences

Context

Content

Acknowledgements

Inspiration

Framingham heart study dataset

Financial_Risk

Meta Kaggle

Kaggle's public data on competitions, users, submission scores, code, and more

Meta Kaggle

Explore our public data on competitions, datasets, kernels (code / notebooks) and more

August 2023 update