100+ datasets found

Top 1000 Kaggle Datasets
kaggle.com
zip
Updated Jan 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
Explore at:
zip(34269 bytes)Available download formats
Dataset updated
Jan 3, 2022
Authors
Trrishan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle
Meta Kaggle Code
kaggle.com
zip
Updated Nov 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(167219625372 bytes)Available download formats
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
h
kaggle-entity-annotated-corpus-ner-dataset
huggingface.co
Updated Jul 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Arias Calles (2022). kaggle-entity-annotated-corpus-ner-dataset [Dataset]. https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2022
Authors
Rafael Arias Calles
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Date: 2022-07-10 Files: ner_dataset.csv Source: Kaggle entity annotated corpus notes: The dataset only contains the tokens and ner tag labels. Labels are uppercase.

About Dataset

from Kaggle Datasets

Context

Annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set. Tip: Use Pandas Dataframe to load dataset if using Python for… See the full description on the dataset page: https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset.
YouTube Trending Videos Dataset
kaggle.com
zip
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). YouTube Trending Videos Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-trending-videos-dataset
Explore at:
zip(29769637 bytes)Available download formats
Dataset updated
Dec 19, 2023
Authors
The Devastator
Area covered
YouTube
Description
YouTube Trending Videos Dataset

Exploring YouTube Trending Videos

By dskl [source]

About this dataset

Moreover it also reveals various engagement metrics such as the number of views the video has received, likes and dislikes it has garnered from viewership. Additionally information related to comment count on particular videos enables analysis regarding viewer interaction and response. Furthermore this dataset describes whether comments or ratings are disabled for a particular video allowing examination into how these factors impact engagement.

By exploring this dataset in-depth marketers can gain valuable insights into identifying trends in content popularity across different countries while taking into account timing considerations based on published day of week. It also opens up avenues for analyzing public sentiment towards specific videos based on likes vs dislikes ratios and comment count which further aids in devising suitable marketing strategies.

Overall,this informative dataset serves as an invaluable asset for researchers,data analysts,and marketers alike who strive to gain deeper understanding about trending video patterns,relevant metrics influencing content virality,factors dictating viewer sentiments,and exploring new possibilities within digital marketing space leveraging YouTube's wide reach

How to use the dataset

How to Use This Dataset: A Guide

In this guide, we will walk you through the different columns in the dataset and provide insights on how you can explore the popularity and engagement of these trending videos. Let's dive in!

Column Descriptions:

title: The title of the video.

channel_title: The title of the YouTube channel that published the video.

publish_date: The date when the video was published on YouTube.

time_frame: The duration of time (e.g., 1 day, 6 hours) that the video has been trending on YouTube.

published_day_of_week: The day of week (e.g., Monday) when the video was published.

publish_country: The country where the video was published.

tags: The tags or keywords associated with the video.

views: The number of views received by a particular video

likes: Number o likes received per each videos

dislike: Number dislikes receives per an individual vidoe 11.comment_count: number of comments

Popular Video Insights:

To gain insights into popular videos based on this dataset, you can focus your analysis using these columns:

title, channel_title, publish_date, time_frame, and** publish_country**.

By analyzing these attributes together with other engagement metrics such as views ,likes,**dislikes,**comments),comment_count you can identify trends in what type content is most popular both globally or within specific countries.

For instance: - You could analyze which channels are consistently publishing trending videos - Explore whether certain types of titles or tags are more likely to attract views and engagement. - Determine if certain days of the week or time frames have a higher likelihood of trending videos being published.

Engagement Insights:

To explore user engagement with the trending videos, you can focus your analysis on these columns:

likes, dislikes, comment_count

By analyzing these attributes you can get insights into how users are interacting with the content. For example: - You could compare the like and dislike ratios to identify positively received videos versus those that are more controversial. - Analyze comment counts to understand how users are engaging with the content and whether comments being disabled affects overall

Research Ideas

Analyzing the popularity and engagement of trending videos: By analyzing the number of views, likes, dislikes, and comments, we can understand which types of videos are popular among YouTube users. We can also examine factors such as comment count and ratings disabled to see how viewers engage with trending videos.

Understanding video trends across different countries: By examining the publish country column, we can compare the popularity of trending videos in different countries. This can help content creators or marketers understand regional preferences and tailor their content strategy accordingly.

Studying the impact of video attributes on engagement: By exploring the relationship between video attributes (such as title, tags, publish day) and engagement metrics (views, likes), we can identify patterns or trends that influence a video's success on YouTube. This information can be...
Student Performance Dataset
kaggle.com
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ghulam Muhammad Nabeel
Description
📊 Student Performance Dataset (Synthetic, Realistic)

Overview

This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

Each row represents one student with features like study hours, attendance, class participation, and final score.
The dataset is small, clean, and structured to be beginner-friendly.

🔑 Columns Description

student_id → Unique identifier for each student.

weekly_self_study_hours → Average weekly self-study hours (0–40). Generated using a normal distribution centered around 15 hours.

attendance_percentage → Attendance percentage (50–100). Simulated with a normal distribution around 85%.

class_participation → Score between 0–10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.

total_score → Final performance score (0–100). Calculated as a function of study hours + random noise, then clipped between 0–100. Stronger correlation with study hours.

grade → Categorical label (A, B, C, D, F) derived from total_score.

📐 Data Generation Logic

Weekly Study Hours: Modeled using a normal distribution (mean ≈ 15, std ≈ 7), capped between 0 and 40 hours.

Scores: More study hours → higher score. Formula:

Random noise simulates differences in learning ability, motivation, etc.

Attendance & Participation: Independent but realistic variations added.

Grades: Assigned from scores using thresholds:

A: ≥ 85

B: ≥ 70

C: ≥ 55

D: ≥ 40

F: < 40

🎯 How to Use This Dataset

Regression Tasks

Predict total_score from weekly_self_study_hours.

Train and evaluate Linear Regression models.

Extend to multiple regression using attendance_percentage and class_participation.

Classification Tasks

Predict grade (A–F) using study hours, attendance, and participation.

Model Evaluation Practice

Apply train-test split and cross-validation.

Evaluate with MAE, RMSE, R².

Compare simple vs. multiple regression.

✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).
Kaggle Dataset Metadata Repository
kaggle.com
zip
Updated Nov 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ijaj Ahmed (2024). Kaggle Dataset Metadata Repository [Dataset]. https://www.kaggle.com/datasets/ijajdatanerd/kaggle-dataset-metadata-repository
Explore at:
zip(5122110 bytes)Available download formats
Dataset updated
Nov 16, 2024
Authors
Ijaj Ahmed
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13367141%2F444a868e669671faf9007822d6f2d348%2FAdd%20a%20heading.png?generation=1731775788329917&alt=media" alt="">

Kaggle Dataset Metadata Collection 📊

This dataset provides comprehensive metadata on various Kaggle datasets, offering detailed information about the dataset owners, creators, usage statistics, licensing, and more. It can help researchers, data scientists, and Kaggle enthusiasts quickly analyze the key attributes of different datasets on Kaggle. 📚

Dataset Overview:

Purpose: To provide detailed insights into Kaggle dataset metadata.

Content: Information related to the dataset's owner, creator, usage metrics, licensing, and more.

Target Audience: Data scientists, Kaggle competitors, and dataset curators.

Columns Description 📋

datasetUrl 🌐: The URL of the Kaggle dataset page. This directs you to the specific dataset's page on Kaggle.

ownerAvatarUrl 🖼️: The URL of the dataset owner's profile avatar on Kaggle.

ownerName 👤: The name of the dataset owner. This can be the individual or organization that created and maintains the dataset.

ownerUrl 🌍: A link to the Kaggle profile page of the dataset owner.

ownerUserId 💼: The unique user ID of the dataset owner on Kaggle.

ownerTier 🎖️: The ownership tier, such as "Tier 1" or "Tier 2," indicating the owner's status or level on Kaggle.

creatorName 👩‍💻: The name of the dataset creator, which could be different from the owner.

creatorUrl 🌍: A link to the Kaggle profile page of the dataset creator.

creatorUserId 💼: The unique user ID of the dataset creator.

scriptCount 📜: The number of scripts (kernels) associated with this dataset.

scriptsUrl 🔗: A link to the scripts (kernels) page for the dataset, where you can explore related code.

forumUrl 💬: The URL to the discussion forum for this dataset, where users can ask questions and share insights.

viewCount 👀: The number of views the dataset page has received on Kaggle.

downloadCount ⬇️: The number of times the dataset has been downloaded by users.

dateCreated 📅: The date when the dataset was first created and uploaded to Kaggle.

dateUpdated 🔄: The date when the dataset was last updated or modified.

voteButton 👍: The metadata for the dataset's vote button, showing how users interact with the dataset's quality ratings.

categories 🏷️: The categories or tags associated with the dataset, helping users filter datasets based on topics of interest (e.g., "Healthcare," "Finance").

licenseName 🛡️: The name of the license under which the dataset is shared (e.g., "CC0," "MIT License").

licenseShortName 🔑: A short form or abbreviation of the dataset's license name (e.g., "CC0" for Creative Commons Zero).

datasetSize 📦: The size of the dataset in terms of storage, typically measured in MB or GB.

commonFileTypes 📂: A list of common file types included in the dataset (e.g., .csv, .json, .xlsx).

downloadUrl ⬇️: A direct link to download the dataset files.

newKernelNotebookUrl 📝: A link to a new kernel or notebook related to this dataset, for those who wish to explore it programmatically.

newKernelScriptUrl 💻: A link to a new script for running computations or processing data related to the dataset.

usabilityRating 🌟: A rating or score representing how usable the dataset is, based on user feedback.

firestorePath 🔍: A reference to the path in Firestore where this dataset’s metadata is stored.

datasetSlug 🏷️: A URL-friendly version of the dataset name, typically used for URLs.

rank 📈: The dataset's rank based on certain metrics (e.g., downloads, votes, views).

datasource 🌐: The source or origin of the dataset (e.g., government data, private organizations).

medalUrl 🏅: A URL pointing to the dataset's medal or badge, indicating the dataset's quality or relevance.

hasHashLink 🔗: Indicates whether the dataset has a hash link for verifying data integrity.

ownerOrganizationId 🏢: The unique organization ID of the dataset's owner if the owner is an organization rather than an individual.

totalVotes 🗳️: The total number of votes the dataset has received from users, reflecting its popularity or quality.

category_names 📑: A comma-separated string of category names that represent the dataset’s classification.

This dataset is a valuable resource for those who want to analyze Kaggle's ecosystem, discover high-quality datasets, and explore metadata in a structured way. 🌍📊
Risk Factors for Cardiovascular Heart Disease
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Risk Factors for Cardiovascular Heart Disease [Dataset]. https://www.kaggle.com/datasets/thedevastator/exploring-risk-factors-for-cardiovascular-diseas
Explore at:
zip(944471 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
The Devastator
Description
Exploring Risk Factors for Cardiovascular Disease in Adults

Examining Age, Gender, Height, Weight and Health Metrics

By Kuzak Dempsy [source]

About this dataset

This dataset contains detailed information on the risk factors for cardiovascular disease. It includes information on age, gender, height, weight, blood pressure values, cholesterol levels, glucose levels, smoking habits and alcohol consumption of over 70 thousand individuals. Additionally it outlines if the person is active or not and if he or she has any cardiovascular diseases. This dataset provides a great resource for researchers to apply modern machine learning techniques to explore the potential relations between risk factors and cardiovascular disease that can ultimately lead to improved understanding of this serious health issue and design better preventive measures

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to explore the risk factors of cardiovascular disease in adults. The aim is to understand how certain demographic factors, health behaviors and biological markers affect the development of heart disease.

To start, look through the columns of data and familiarize yourself with each one. Understand what each field means and how it relates to heart health: - Age: Age of participant (integer) - Gender: Gender of participant (male/female). - Height: Height measured in centimeters (integer) - Weight: Weight measured in kilograms (integer) - Ap_hi: Systolic blood pressure reading taken from patient (integer) - Ap_lo : Diastolic blood pressure reading taken from patient (integer) - Cholesterol : Total cholesterol level read as mg/dl on a scale 0 - 5+ units( integer). Each unit denoting increase/decrease by 20 mg/dL respectively.
‐ Gluc : Glucose level read as mmol/l on a scale 0 - 16+ units( integer). Each unit denoting increase Decreaseby 1 mmol/L respectively. ‐ Smoke : Whether person smokes or not(binary; 0= No , 1=Yes). ‐ Alco : Whether person drinks alcohol or not(binary; 0 =No ,1 =Yes ). • Active : whether person physically active or not( Binary ;0 =No,1 = Yes ). . Cardio : whether person suffers from cardiovascular diseases or not(Binary ;0 – no , 1 ‑yes ).Identify any trends between the different values for each attribute and the developmetn for cardiovascular disease among individuals represented by this dataset . Age, gender, weight, lifestyle practices like smoking & drinking alcohol are all key influences when analyzing this problem set. You can always modify pieces of your analysis until you're able to find patterns that will enable you make conclusions based on your understanding & exploration. You can further enrich your understanding using couple mopdeling technique like Regressions & Classification models over this dataset alongwith latest Deep Learning approach! Have Fun!

Research Ideas

Analyzing the effect of lifestyle and environmental factors on the risk of cardiovascular disease.

Predicting the risks of different age groups based on their demographic characteristics such as gender, height, weight and smoking status.

Detecting patterns between levels of physical activity, blood pressure and cholesterol levels with likelihood of developing cardiovascular disease among individuals

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: heart_data.csv | Column name | Description | |:----------------|:---------------------------------------------------------| | age | Age of the individual. (Integer) | | gender | Gender of the individual. (String) | | height | Height of the individual in centimeters. (Integer) | | weight | Weight of the individual in kilograms. (Integer) | | ap_hi | Systolic blood pressure reading. (Integer) | | ap_lo | Diastolic blood pressure reading. (Integer) | | cholesterol | Cholesterol level of the individual. (Integer) | | gluc |...
Social Media and Mental Health
kaggle.com
zip
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
Explore at:
zip(10944 bytes)Available download formats
Dataset updated
Jul 18, 2023
Authors
SouvikAhmed071
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

The following is the Google Colab link to the project, done on Jupyter Notebook -

https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

The following is the GitHub Repository of the project -

https://github.com/daerkns/social-media-and-mental-health

Libraries used for the Project -

Pandas Numpy Matplotlib Seaborn Sci-kit Learn
Comprehensive Medical Q&A Dataset
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
Explore at:
zip(5126941 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

By Huggingface Hub [source]

About this dataset

The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

Research Ideas

Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.

Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.

Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
📱💻 Mental Health & Technology Usage Dataset 🌱
kaggle.com
zip
Updated Sep 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waqar Ali (2024). 📱💻 Mental Health & Technology Usage Dataset 🌱 [Dataset]. https://www.kaggle.com/datasets/waqi786/mental-health-and-technology-usage-dataset
Explore at:
zip(201342 bytes)Available download formats
Dataset updated
Sep 5, 2024
Authors
Waqar Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset offers insights into how daily technology usage, including social media and screen time, impacts mental health. 📊 It captures various behavioral patterns and their correlations with mental health indicators like stress levels, sleep quality, and productivity. Dive in to analyze the relationship between our digital lives and mental wellness! 🌟

The data is useful for research, academic projects, or building predictive models to understand trends in mental health influenced by screen time and technology habits. 🔍📉
(Sunset)📒 Meta Kaggle ported to MS SQL SERVER
kaggle.com
zip
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BwandoWando (2024). (Sunset)📒 Meta Kaggle ported to MS SQL SERVER [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-ported-to-sql-server-2022-database
Explore at:
zip(8635902534 bytes)Available download formats
Dataset updated
Mar 20, 2024
Authors
BwandoWando
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.

MSSQL VERSION: SQL Server 2022

Collation: SQL_Latin1_General_CP1_CI_AS

Recovery model: simple

Requirements

Download and install the SQL SERVER 2022 Developer edition here

Download the backup file

Restore the backup file into your local. If you havent done this before, it's easy and straightforward. Here is a guide.

(QUOTED FROM THE ORIGINAL DATASET)

Meta Kaggle

Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">

Notes

I repeat, I just ported the dataset. All credits to Kaggle for the amazing source dataset.

Cover image from https://picryl.com/media/space-earth-bug-ce3ca6
Kaggle: Forum Discussions
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
Explore at:
zip(542099 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
Nicolás Ariel González Muñoz
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

Summary

Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

Extraction Technique

As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

Structure

This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.
GSM8K - Grade School Math 8K Q&A
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
Explore at:
zip(3418660 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

By Huggingface Hub [source]

About this dataset

This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

Research Ideas

Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.

Generating new grade school math questions and answers using g...
Sales and Satisfaction
kaggle.com
zip
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matin Mahmoudi ✨ (2024). Sales and Satisfaction [Dataset]. https://www.kaggle.com/datasets/matinmahmoudi/sales-and-satisfaction
Explore at:
zip(687693 bytes)Available download formats
Dataset updated
May 22, 2024
Authors
Matin Mahmoudi ✨
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
One dataset contains missing values (NaNs) and the other does not. These datasets contain information on sales and customer satisfaction before and after an intervention, as well as purchase data for control and treatment groups. The dataset is synthetic and was created for use in statistical analysis.

This is an original dataset.

Features

Group - Description: Indicates whether the data point belongs to the Control or Treatment group. - Categories: Control, Treatment

Customer_Segment - Description: Categorizes customers based on their value. - Categories: High Value, Medium Value, Low Value

Sales_Before - Description: Sales figures before the intervention. - Data Type: Numerical

Sales_After - Description: Sales figures after the intervention. - Data Type: Numerical

Customer_Satisfaction_Before - Description: Customer satisfaction scores before the intervention. - Data Type: Numerical

Customer_Satisfaction_After - Description: Customer satisfaction scores after the intervention. - Data Type: Numerical

Purchase_Made - Description: Indicates whether a purchase was made after the intervention. - Categories: Yes, No
Social Media Usage and User Behavior
kaggle.com
zip
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIMRAN DESAI (2025). Social Media Usage and User Behavior [Dataset]. https://www.kaggle.com/datasets/simrandesai1616/social-media-behavior
Explore at:
zip(9184 bytes)Available download formats
Dataset updated
Jan 8, 2025
Authors
SIMRAN DESAI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset comes from a Social Media Analysis survey that aims to analyse user behavior on social media, focusing on attention monetization and engagement based on 110+ self-reported responses. It was conducted using Google Forms, with diverse participants to capture varying user profiles and the variance in levels of awareness about social media's impact on daily routines.
60k-data-with-context-v2
kaggle.com
Updated Sep 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Deotte (2023). 60k-data-with-context-v2 [Dataset]. https://www.kaggle.com/datasets/cdeotte/60k-data-with-context-v2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Chris Deotte
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset can be used to train an Open Book model for Kaggle's LLM Science Exam competition. This dataset was generated by searching and concatenating all publicly shared datasets on Sept 1 2023.

The context column was generated using Mgoksu's notebook here with NUM_TITLES=5 and NUM_SENTENCES=20

The source column indicates where the dataset originated. Below are the sources:

source = 1 & 2 * Radek's 6.5k dataset. Discussion here annd here, dataset here.

source = 3 & 4 * Radek's 15k + 5.9k. Discussion here and here, dataset here

source = 5 & 6 * Radek's 6k + 6k. Discussion here and here, dataset here

source = 7 * Leonid's 1k. Discussion here, dataset here

source = 8 * Gigkpeaeums 3k. Discussion here, dataset here

source = 9 * Anil 3.4k. Discussion here, dataset here

source = 10, 11, 12 * Mgoksu 13k. Discussion here, dataset here
LMS Tracking Dataset
kaggle.com
zip
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). LMS Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/lms-tracking-dataset
Explore at:
zip(5419 bytes)Available download formats
Dataset updated
May 6, 2024
Authors
Prasad Patil
Description
This dataset was collected by a edtech startup. The startup is into teaching entrepreneurial life-skills in animated-gamified format through its video series to kids between the age group of 6-14 years. Through its learning management system the company tracks the progress made by all of its subscribers on the platform. Company records platform content usage activity data and tries to follow up with parents if there is any inactiveness on the platform by their child. Here's more information about the dataset

Dataset Information:

Child Name: Name of the subscriber kid

Email Address: Email address created by parent

Contact: Contact details of the parent

follow up: Responses received by the company employee after progress follow-up over the phone.

response: segregating the follow-up responses in to categories

Introduction: Tutorial 1

Activity:- Know your personality, a fun way:Tutorial 2

A Simple Quiz on the previous Video: Quiz on the Tutorial 2

Lets see what ‘Product’ is…:Tutorial 3

A Simple Quiz on the previous Video:Quiz on the Tutorial 3

Product that represents me: Tutorial 4

Let's see what 'Service' means: Tutorial 5

A Simple Quiz on the previous Video:Quiz on the Tutorial 5

Instruction for 'Product & Service' worksheet:Tutorial 6

Activity:- Product and Service Worksheet: Exercise on Tutorial 6

Instructions for Product Word Association:Tutorial 7

Activity:- Product Word Association:Exercise on Tutorial 7

Life without products??.... Impossible !:Tutorial 8

What Is a Need?:Tutorial 9

A Simple Quiz on the previous Video:Quiz on the Tutorial 9

Summary of Session 1: Summarizing all the learnings from the Tutorials 1-9

Your Feedback on Session 1: Feedback page

There is some missing data as well. I hope it would be good dataset for beginners practicing their NLP skills.

Image by Steven Weirather from Pixabay

Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.
Mental Health
kaggle.com
zip
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahdi Mashayekhi (2025). Mental Health [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/mental-health
Explore at:
zip(137847 bytes)Available download formats
Dataset updated
May 7, 2025
Authors
Mahdi Mashayekhi
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
📘 Dataset Description

This dataset provides a realistic, synthetic simulation of global mental health survey responses from 10,000 individuals. It was created to reflect actual patterns seen in workplace mental health data while ensuring full anonymity and privacy.

🧠 Context & Purpose

Mental health issues affect people across all ages, countries, and industries. Understanding patterns in mental health at work, access to treatment, and stigma around disclosure is essential for shaping better workplace policies and interventions.

This dataset is ideal for:

Training and evaluating machine learning models

Practicing classification or clustering techniques

Performing exploratory data analysis (EDA)

Studying fairness and bias in mental health predictions

Creating realistic dashboards for HR analytics or healthcare systems

📊 Dataset Highlights

10,000 rows representing anonymized individuals

Diverse global coverage with country/state info

Demographic attributes like age, gender, employment type

Information about work environment and company support

Responses about mental health history, treatment, and workplace stigma

💡 Example Use Cases

Predicting the likelihood of an employee seeking mental health treatment

Identifying factors most correlated with workplace stress

Segmenting users by mental health risk using clustering

Building fairness-aware models to reduce bias in mental health predictions

⚠️ Notes

This dataset is entirely synthetic. No personally identifiable information (PII) or real user data is included.

It was generated based on patterns observed in public mental health datasets and surveys.
Reddit /r/datasets Dataset
kaggle.com
zip
Updated Nov 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Reddit /r/datasets Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-meta-corpus-of-datasets-the-reddit-dataset
Explore at:
zip(9619636 bytes)Available download formats
Dataset updated
Nov 28, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Meta-Corpus of Datasets: The Reddit Dataset

The Complete Collection of Datasets Posted on Reddit

By SocialGrep [source]

About this dataset

A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.

Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.

In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.

You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes

Research Ideas

Finding correlations between different types of datasets

Determining which datasets are most popular on Reddit

Analyzing the sentiments of post and comments on Reddit's /r/datasets board

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |

File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.
University FAQ Dataset
kaggle.com
zip
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andre Avindra (2023). University FAQ Dataset [Dataset]. https://www.kaggle.com/datasets/andreavindra/university-faq-dataset
Explore at:
zip(8402 bytes)Available download formats
Dataset updated
Jul 4, 2023
Authors
Andre Avindra
Description
The dataset is created for a chatbot using deep learning and NLP. This dataset can be used as an input to train deep learning models with NLP techniques, such as natural language processing and deep learning algorithms like neural networks, to develop a chatbot that can understand user conversation patterns and provide appropriate responses.

Facebook

Twitter

Click to copy link

Link copied

Cite

Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets

Top 1000 Kaggle Datasets

Kaggle's most popular datasets

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(34269 bytes)Available download formats

Dataset updated

Jan 3, 2022

Authors

Trrishan

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle

Clear search

Close search

Google apps

Main menu

Top 1000 Kaggle Datasets

From wiki

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

kaggle-entity-annotated-corpus-ner-dataset

YouTube Trending Videos Dataset

YouTube Trending Videos Dataset

Exploring YouTube Trending Videos

About this dataset

How to use the dataset

How to Use This Dataset: A Guide

Column Descriptions:

Popular Video Insights:

Engagement Insights:

Research Ideas

Student Performance Dataset

📊 Student Performance Dataset (Synthetic, Realistic)

Overview

🔑 Columns Description

📐 Data Generation Logic

🎯 How to Use This Dataset

Kaggle Dataset Metadata Repository

Kaggle Dataset Metadata Collection 📊

Dataset Overview:

Columns Description 📋

Risk Factors for Cardiovascular Heart Disease

Exploring Risk Factors for Cardiovascular Disease in Adults

Examining Age, Gender, Height, Weight and Health Metrics

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Social Media and Mental Health

Comprehensive Medical Q&A Dataset

Comprehensive Medical Q&A Dataset

Unlocking Healthcare Data with Natural Language Processing

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

📱💻 Mental Health & Technology Usage Dataset 🌱

(Sunset)📒 Meta Kaggle ported to MS SQL SERVER

Context

Requirements

(QUOTED FROM THE ORIGINAL DATASET)

Meta Kaggle

Notes

Kaggle: Forum Discussions

Summary

Extraction Technique

Structure

GSM8K - Grade School Math 8K Q&A

GSM8K - Grade School Math 8K Q&A

A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Sales and Satisfaction

This is an original dataset.

Features

Social Media Usage and User Behavior

60k-data-with-context-v2

LMS Tracking Dataset

Dataset Information: