100+ datasets found
  1. Top 1000 Kaggle Datasets

    • kaggle.com
    zip
    Updated Jan 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
    Explore at:
    zip(34269 bytes)Available download formats
    Dataset updated
    Jan 3, 2022
    Authors
    Trrishan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    From wiki

    Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

    Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

    Source: Kaggle

  2. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(167219625372 bytes)Available download formats
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  3. h

    kaggle-entity-annotated-corpus-ner-dataset

    • huggingface.co
    Updated Jul 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Arias Calles (2022). kaggle-entity-annotated-corpus-ner-dataset [Dataset]. https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2022
    Authors
    Rafael Arias Calles
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Date: 2022-07-10 Files: ner_dataset.csv Source: Kaggle entity annotated corpus notes: The dataset only contains the tokens and ner tag labels. Labels are uppercase.

      About Dataset
    

    from Kaggle Datasets

      Context
    

    Annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set. Tip: Use Pandas Dataframe to load dataset if using Python for… See the full description on the dataset page: https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset.

  4. YouTube Trending Videos Dataset

    • kaggle.com
    zip
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). YouTube Trending Videos Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/youtube-trending-videos-dataset
    Explore at:
    zip(29769637 bytes)Available download formats
    Dataset updated
    Dec 19, 2023
    Authors
    The Devastator
    Area covered
    YouTube
    Description

    YouTube Trending Videos Dataset

    Exploring YouTube Trending Videos

    By dskl [source]

    About this dataset

    Moreover it also reveals various engagement metrics such as the number of views the video has received, likes and dislikes it has garnered from viewership. Additionally information related to comment count on particular videos enables analysis regarding viewer interaction and response. Furthermore this dataset describes whether comments or ratings are disabled for a particular video allowing examination into how these factors impact engagement.

    By exploring this dataset in-depth marketers can gain valuable insights into identifying trends in content popularity across different countries while taking into account timing considerations based on published day of week. It also opens up avenues for analyzing public sentiment towards specific videos based on likes vs dislikes ratios and comment count which further aids in devising suitable marketing strategies.

    Overall,this informative dataset serves as an invaluable asset for researchers,data analysts,and marketers alike who strive to gain deeper understanding about trending video patterns,relevant metrics influencing content virality,factors dictating viewer sentiments,and exploring new possibilities within digital marketing space leveraging YouTube's wide reach

    How to use the dataset

    How to Use This Dataset: A Guide

    In this guide, we will walk you through the different columns in the dataset and provide insights on how you can explore the popularity and engagement of these trending videos. Let's dive in!

    Column Descriptions:

    • title: The title of the video.
    • channel_title: The title of the YouTube channel that published the video.
    • publish_date: The date when the video was published on YouTube.
    • time_frame: The duration of time (e.g., 1 day, 6 hours) that the video has been trending on YouTube.
    • published_day_of_week: The day of week (e.g., Monday) when the video was published.
    • publish_country: The country where the video was published.
    • tags: The tags or keywords associated with the video.
    • views: The number of views received by a particular video
    • likes: Number o likes received per each videos
    • dislike: Number dislikes receives per an individual vidoe 11.comment_count: number of comments

    Popular Video Insights:

    To gain insights into popular videos based on this dataset, you can focus your analysis using these columns:

    title, channel_title, publish_date, time_frame, and** publish_country**.

    By analyzing these attributes together with other engagement metrics such as views ,likes,**dislikes,**comments),comment_count you can identify trends in what type content is most popular both globally or within specific countries.

    For instance: - You could analyze which channels are consistently publishing trending videos - Explore whether certain types of titles or tags are more likely to attract views and engagement. - Determine if certain days of the week or time frames have a higher likelihood of trending videos being published.

    Engagement Insights:

    To explore user engagement with the trending videos, you can focus your analysis on these columns:

    likes, dislikes, comment_count

    By analyzing these attributes you can get insights into how users are interacting with the content. For example: - You could compare the like and dislike ratios to identify positively received videos versus those that are more controversial. - Analyze comment counts to understand how users are engaging with the content and whether comments being disabled affects overall

    Research Ideas

    • Analyzing the popularity and engagement of trending videos: By analyzing the number of views, likes, dislikes, and comments, we can understand which types of videos are popular among YouTube users. We can also examine factors such as comment count and ratings disabled to see how viewers engage with trending videos.
    • Understanding video trends across different countries: By examining the publish country column, we can compare the popularity of trending videos in different countries. This can help content creators or marketers understand regional preferences and tailor their content strategy accordingly.
    • Studying the impact of video attributes on engagement: By exploring the relationship between video attributes (such as title, tags, publish day) and engagement metrics (views, likes), we can identify patterns or trends that influence a video's success on YouTube. This information can be...
  5. Student Performance Dataset

    • kaggle.com
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghulam Muhammad Nabeel
    Description

    📊 Student Performance Dataset (Synthetic, Realistic)

    Overview

    This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

    Each row represents one student with features like study hours, attendance, class participation, and final score.
    The dataset is small, clean, and structured to be beginner-friendly.

    🔑 Columns Description

    • student_id → Unique identifier for each student.
    • weekly_self_study_hours → Average weekly self-study hours (0–40). Generated using a normal distribution centered around 15 hours.
    • attendance_percentage → Attendance percentage (50–100). Simulated with a normal distribution around 85%.
    • class_participation → Score between 0–10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.
    • total_score → Final performance score (0–100). Calculated as a function of study hours + random noise, then clipped between 0–100. Stronger correlation with study hours.
    • grade → Categorical label (A, B, C, D, F) derived from total_score.

    📐 Data Generation Logic

    1. Weekly Study Hours: Modeled using a normal distribution (mean ≈ 15, std ≈ 7), capped between 0 and 40 hours.
    2. Scores: More study hours → higher score. Formula:

    Random noise simulates differences in learning ability, motivation, etc.

    1. Attendance & Participation: Independent but realistic variations added.
    2. Grades: Assigned from scores using thresholds:
    • A: ≥ 85
    • B: ≥ 70
    • C: ≥ 55
    • D: ≥ 40
    • F: < 40

    🎯 How to Use This Dataset

    Regression Tasks

    • Predict total_score from weekly_self_study_hours.
    • Train and evaluate Linear Regression models.
    • Extend to multiple regression using attendance_percentage and class_participation.

    Classification Tasks

    • Predict grade (A–F) using study hours, attendance, and participation.

    Model Evaluation Practice

    • Apply train-test split and cross-validation.
    • Evaluate with MAE, RMSE, R².
    • Compare simple vs. multiple regression.

    ✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).

  6. Kaggle Dataset Metadata Repository

    • kaggle.com
    zip
    Updated Nov 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ijaj Ahmed (2024). Kaggle Dataset Metadata Repository [Dataset]. https://www.kaggle.com/datasets/ijajdatanerd/kaggle-dataset-metadata-repository
    Explore at:
    zip(5122110 bytes)Available download formats
    Dataset updated
    Nov 16, 2024
    Authors
    Ijaj Ahmed
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13367141%2F444a868e669671faf9007822d6f2d348%2FAdd%20a%20heading.png?generation=1731775788329917&alt=media" alt="">

    Kaggle Dataset Metadata Collection 📊

    This dataset provides comprehensive metadata on various Kaggle datasets, offering detailed information about the dataset owners, creators, usage statistics, licensing, and more. It can help researchers, data scientists, and Kaggle enthusiasts quickly analyze the key attributes of different datasets on Kaggle. 📚

    Dataset Overview:

    • Purpose: To provide detailed insights into Kaggle dataset metadata.
    • Content: Information related to the dataset's owner, creator, usage metrics, licensing, and more.
    • Target Audience: Data scientists, Kaggle competitors, and dataset curators.

    Columns Description 📋

    • datasetUrl 🌐: The URL of the Kaggle dataset page. This directs you to the specific dataset's page on Kaggle.

    • ownerAvatarUrl 🖼️: The URL of the dataset owner's profile avatar on Kaggle.

    • ownerName 👤: The name of the dataset owner. This can be the individual or organization that created and maintains the dataset.

    • ownerUrl 🌍: A link to the Kaggle profile page of the dataset owner.

    • ownerUserId 💼: The unique user ID of the dataset owner on Kaggle.

    • ownerTier 🎖️: The ownership tier, such as "Tier 1" or "Tier 2," indicating the owner's status or level on Kaggle.

    • creatorName 👩‍💻: The name of the dataset creator, which could be different from the owner.

    • creatorUrl 🌍: A link to the Kaggle profile page of the dataset creator.

    • creatorUserId 💼: The unique user ID of the dataset creator.

    • scriptCount 📜: The number of scripts (kernels) associated with this dataset.

    • scriptsUrl 🔗: A link to the scripts (kernels) page for the dataset, where you can explore related code.

    • forumUrl 💬: The URL to the discussion forum for this dataset, where users can ask questions and share insights.

    • viewCount 👀: The number of views the dataset page has received on Kaggle.

    • downloadCount ⬇️: The number of times the dataset has been downloaded by users.

    • dateCreated 📅: The date when the dataset was first created and uploaded to Kaggle.

    • dateUpdated 🔄: The date when the dataset was last updated or modified.

    • voteButton 👍: The metadata for the dataset's vote button, showing how users interact with the dataset's quality ratings.

    • categories 🏷️: The categories or tags associated with the dataset, helping users filter datasets based on topics of interest (e.g., "Healthcare," "Finance").

    • licenseName 🛡️: The name of the license under which the dataset is shared (e.g., "CC0," "MIT License").

    • licenseShortName 🔑: A short form or abbreviation of the dataset's license name (e.g., "CC0" for Creative Commons Zero).

    • datasetSize 📦: The size of the dataset in terms of storage, typically measured in MB or GB.

    • commonFileTypes 📂: A list of common file types included in the dataset (e.g., .csv, .json, .xlsx).

    • downloadUrl ⬇️: A direct link to download the dataset files.

    • newKernelNotebookUrl 📝: A link to a new kernel or notebook related to this dataset, for those who wish to explore it programmatically.

    • newKernelScriptUrl 💻: A link to a new script for running computations or processing data related to the dataset.

    • usabilityRating 🌟: A rating or score representing how usable the dataset is, based on user feedback.

    • firestorePath 🔍: A reference to the path in Firestore where this dataset’s metadata is stored.

    • datasetSlug 🏷️: A URL-friendly version of the dataset name, typically used for URLs.

    • rank 📈: The dataset's rank based on certain metrics (e.g., downloads, votes, views).

    • datasource 🌐: The source or origin of the dataset (e.g., government data, private organizations).

    • medalUrl 🏅: A URL pointing to the dataset's medal or badge, indicating the dataset's quality or relevance.

    • hasHashLink 🔗: Indicates whether the dataset has a hash link for verifying data integrity.

    • ownerOrganizationId 🏢: The unique organization ID of the dataset's owner if the owner is an organization rather than an individual.

    • totalVotes 🗳️: The total number of votes the dataset has received from users, reflecting its popularity or quality.

    • category_names 📑: A comma-separated string of category names that represent the dataset’s classification.

    This dataset is a valuable resource for those who want to analyze Kaggle's ecosystem, discover high-quality datasets, and explore metadata in a structured way. 🌍📊

  7. Risk Factors for Cardiovascular Heart Disease

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Risk Factors for Cardiovascular Heart Disease [Dataset]. https://www.kaggle.com/datasets/thedevastator/exploring-risk-factors-for-cardiovascular-diseas
    Explore at:
    zip(944471 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    The Devastator
    Description

    Exploring Risk Factors for Cardiovascular Disease in Adults

    Examining Age, Gender, Height, Weight and Health Metrics

    By Kuzak Dempsy [source]

    About this dataset

    This dataset contains detailed information on the risk factors for cardiovascular disease. It includes information on age, gender, height, weight, blood pressure values, cholesterol levels, glucose levels, smoking habits and alcohol consumption of over 70 thousand individuals. Additionally it outlines if the person is active or not and if he or she has any cardiovascular diseases. This dataset provides a great resource for researchers to apply modern machine learning techniques to explore the potential relations between risk factors and cardiovascular disease that can ultimately lead to improved understanding of this serious health issue and design better preventive measures

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to explore the risk factors of cardiovascular disease in adults. The aim is to understand how certain demographic factors, health behaviors and biological markers affect the development of heart disease.

    To start, look through the columns of data and familiarize yourself with each one. Understand what each field means and how it relates to heart health: - Age: Age of participant (integer) - Gender: Gender of participant (male/female). - Height: Height measured in centimeters (integer) - Weight: Weight measured in kilograms (integer) - Ap_hi: Systolic blood pressure reading taken from patient (integer) - Ap_lo : Diastolic blood pressure reading taken from patient (integer) - Cholesterol : Total cholesterol level read as mg/dl on a scale 0 - 5+ units( integer). Each unit denoting increase/decrease by 20 mg/dL respectively.
    ‐ Gluc : Glucose level read as mmol/l on a scale 0 - 16+ units( integer). Each unit denoting increase Decreaseby 1 mmol/L respectively. ‐ Smoke : Whether person smokes or not(binary; 0= No , 1=Yes). ‐ Alco ​ : Whether person drinks alcohol or not(binary; 0 =No ,1 =Yes ). • Active : whether person physically active or not( Binary ;0 =No,1 = Yes ). . Cardio : whether person suffers from cardiovascular diseases or not(Binary ;0 – no , 1 ‑yes ).Identify any trends between the different values for each attribute and the developmetn for cardiovascular disease among individuals represented by this dataset . Age, gender, weight, lifestyle practices like smoking & drinking alcohol are all key influences when analyzing this problem set. You can always modify pieces of your analysis until you're able to find patterns that will enable you make conclusions based on your understanding & exploration. You can further enrich your understanding using couple mopdeling technique like Regressions & Classification models over this dataset alongwith latest Deep Learning approach! Have Fun!

    Research Ideas

    • Analyzing the effect of lifestyle and environmental factors on the risk of cardiovascular disease.
    • Predicting the risks of different age groups based on their demographic characteristics such as gender, height, weight and smoking status.
    • Detecting patterns between levels of physical activity, blood pressure and cholesterol levels with likelihood of developing cardiovascular disease among individuals

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: heart_data.csv | Column name | Description | |:----------------|:---------------------------------------------------------| | age | Age of the individual. (Integer) | | gender | Gender of the individual. (String) | | height | Height of the individual in centimeters. (Integer) | | weight | Weight of the individual in kilograms. (Integer) | | ap_hi | Systolic blood pressure reading. (Integer) | | ap_lo | Diastolic blood pressure reading. (Integer) | | cholesterol | Cholesterol level of the individual. (Integer) | | gluc |...

  8. Social Media and Mental Health

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
    Explore at:
    zip(10944 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    SouvikAhmed071
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

    The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

    This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

    The following is the Google Colab link to the project, done on Jupyter Notebook -

    https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

    The following is the GitHub Repository of the project -

    https://github.com/daerkns/social-media-and-mental-health

    Libraries used for the Project -

    Pandas
    Numpy
    Matplotlib
    Seaborn
    Sci-kit Learn
    
  9. Comprehensive Medical Q&A Dataset

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Comprehensive Medical Q&A Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset
    Explore at:
    zip(5126941 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Comprehensive Medical Q&A Dataset

    Unlocking Healthcare Data with Natural Language Processing

    By Huggingface Hub [source]

    About this dataset

    The MedQuad dataset provides a comprehensive source of medical questions and answers for natural language processing. With over 43,000 patient inquiries from real-life situations categorized into 31 distinct types of questions, the dataset offers an invaluable opportunity to research correlations between treatments, chronic diseases, medical protocols and more. Answers provided in this database come not only from doctors but also other healthcare professionals such as nurses and pharmacists, providing a more complete array of responses to help researchers unlock deeper insights within the realm of healthcare. This incredible trove of knowledge is just waiting to be mined - so grab your data mining equipment and get exploring!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to make the most out of this dataset, start by having a look at the column names and understanding what information they offer: qtype (the type of medical question), Question (the question in itself), and Answer (the expert response). The qtype column will help you categorize the dataset according to your desired question topics. Once you have filtered down your criteria as much as possible using qtype, it is time to analyze the data. Start by asking yourself questions such as “What treatments do most patients search for?” or “Are there any correlations between chronic conditions and protocols?” Then use simple queries such as SELECT Answer FROM MedQuad WHERE qtype='Treatment' AND Question LIKE '%pain%' to get closer to answering those questions.

    Once you have obtained new insights about healthcare based on the answers provided in this dynmaic data set - now it’s time for action! Use all that newfound understanding about patient needs in order develop educational materials and implement any suggested changes necessary. If more criteria are needed for querying this data set see if MedQuad offers additional columns; sometimes extra columns may be added periodically that could further enhance analysis capabilities; look out for notifications if these happen.

    Finally once making an impact with the use case(s) - don't forget proper citation etiquette; give credit where credit is due!

    Research Ideas

    • Developing medical diagnostic tools that use natural language processing (NLP) to better identify and diagnose health conditions in patients.
    • Creating predictive models to anticipate treatment options for different medical conditions using machine learning techniques.
    • Leveraging the dataset to build chatbots and virtual assistants that are able to answer a broad range of questions about healthcare with expert-level accuracy

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------------------------| | qtype | The type of medical question. (String) | | Question | The medical question posed by the patient. (String) | | Answer | The expert response to the medical question. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  10. 📱💻 Mental Health & Technology Usage Dataset 🌱

    • kaggle.com
    zip
    Updated Sep 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 📱💻 Mental Health & Technology Usage Dataset 🌱 [Dataset]. https://www.kaggle.com/datasets/waqi786/mental-health-and-technology-usage-dataset
    Explore at:
    zip(201342 bytes)Available download formats
    Dataset updated
    Sep 5, 2024
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset offers insights into how daily technology usage, including social media and screen time, impacts mental health. 📊 It captures various behavioral patterns and their correlations with mental health indicators like stress levels, sleep quality, and productivity. Dive in to analyze the relationship between our digital lives and mental wellness! 🌟

    The data is useful for research, academic projects, or building predictive models to understand trends in mental health influenced by screen time and technology habits. 🔍📉

  11. (Sunset)📒 Meta Kaggle ported to MS SQL SERVER

    • kaggle.com
    zip
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BwandoWando (2024). (Sunset)📒 Meta Kaggle ported to MS SQL SERVER [Dataset]. https://www.kaggle.com/datasets/bwandowando/meta-kaggle-ported-to-sql-server-2022-database
    Explore at:
    zip(8635902534 bytes)Available download formats
    Dataset updated
    Mar 20, 2024
    Authors
    BwandoWando
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I've always wanted to explore Kaggle's Meta Kaggle dataset but I am more comfortable on using TSQL when it comes to writing (very) complex queries. Also, I tend to write queries faster when using SQL MANAGEMENT STUDIO, like 100x faster. So, I ported Kaggle's Meta Kaggle dataset into MS SQL SERVER 2022 database format, created a backup file, then uploaded it here.

    • MSSQL VERSION: SQL Server 2022
    • Collation: SQL_Latin1_General_CP1_CI_AS
    • Recovery model: simple

    Requirements

    • Download and install the SQL SERVER 2022 Developer edition here
    • Download the backup file
    • Restore the backup file into your local. If you havent done this before, it's easy and straightforward. Here is a guide.

    (QUOTED FROM THE ORIGINAL DATASET)

    Meta Kaggle

    Explore Kaggle's public data on competitions, datasets, kernels (code/ notebooks) and more Meta Kaggle may not be the Rosetta Stone of data science, but they think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2F2ad97bce7839d6e57674e7a82981ed23%2F2Egeb8R.png?generation=1688912953875842&alt=media" alt="">

    Notes

  12. Kaggle: Forum Discussions

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolás Ariel González Muñoz (2025). Kaggle: Forum Discussions [Dataset]. https://www.kaggle.com/datasets/nicolasgonzalezmunoz/kaggle-forum-discussions
    Explore at:
    zip(542099 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Nicolás Ariel González Muñoz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.

    Summary

    Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.

    This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.

    Extraction Technique

    As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.

    The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.

    Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.

    If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.

    Structure

    This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.

    The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.

    By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.

    Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.

  13. GSM8K - Grade School Math 8K Q&A

    • kaggle.com
    zip
    Updated Nov 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GSM8K - Grade School Math 8K Q&A [Dataset]. https://www.kaggle.com/datasets/thedevastator/grade-school-math-8k-q-a
    Explore at:
    zip(3418660 bytes)Available download formats
    Dataset updated
    Nov 24, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GSM8K - Grade School Math 8K Q&A

    A Linguistically Diverse Dataset for Multi-Step Reasoning Question Answering

    By Huggingface Hub [source]

    About this dataset

    This Grade School Math 8K Linguistically Diverse Training & Test Set is designed to help you develop and improve your understanding of multi-step reasoning question answering. The dataset contains three separate data files: the socratic_test.csv, main_test.csv, and main_train.csv, each containing a set of questions and answers related to grade school math that consists of multiple steps. Each file contains the same columns: question, answer. The questions contained in this dataset are thoughtfully crafted to lead you through the reasoning journey for arriving at the correct answer each time, allowing you immense opportunities for learning through practice. With over 8 thousand entries for both training and testing purposes in this GSM8K dataset, it takes advanced multi-step reasoning skills to ace these questions! Deepen your knowledge today and master any challenge with ease using this amazing GSM8K set!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a unique opportunity to study multi-step reasoning for question answering. The GSM8K Linguistically Diverse Training & Test Set consists of 8,000 questions and answers that have been created to simulate real-world scenarios in grade school mathematics. Each question is paired with one answer based on a comprehensive test set. The questions cover topics such as algebra, arithmetic, probability and more.

    The dataset consists of two files: main_train.csv and main_test.csv; the former contains questions and answers specifically related to grade school math while the latter includes multi-step reasoning tests for each category of the Ontario Math Curriculum (OMC). In addition, it has three columns - Question (Question), Answer ([Answer]) – meaning that each row contains 3 sequential question/answer pairs making it possible to take a single path from the start of any given answer or branch out from there according to the logic construction required by each respective problem scenario; these columns can be used in combination with text analysis algorithms like ELMo or BERT to explore different formats of representation for responding accurately during natural language processing tasks such as Q&A or building predictive models for numerical data applications like measuring classifying resource efficiency initiatives or forecasting sales volumes in retail platforms..

    To use this dataset efficiently you should first get familiar with its structure by reading through its documentation so you are aware all available info regarding items content definition & format requirements then study examples that best suits your specific purpose whether is performing an experiment inspired by education research needs, generate insights related marketing analytics reports making predictions over artificial intelligence project capacity improvements optimization gains etcetera having full access knowledge about available source keeps you up & running from preliminary background work toward knowledge mining endeavor completion success Support User success qualitative exploration sessions make sure learn all variables definitions employed heterogeneous tools before continue Research journey starts experienced Researchers come prepared valuable resource items employed go beyond discovery false alarm halt advancement flow focus unprocessed raw values instead ensure clear cutting vision behind objectives support UserHelp plans going mean project meaningful campaign deliverables production planning safety milestones dovetail short deliveries enable design interfaces session workforce making everything automated fun entry functioning final transformation awaited offshoot Goals outcome parameters monitor life cycle management ensures ongoing projects feedbacks monitored video enactment resources tapped Proficiently balanced activity sheets tracking activities progress deliberation points evaluation radius highlights outputs primary phase visit egress collaboration agendas Client cumulative returns records capture performance illustrated collectively diarized successive setup sweetens conditions researched environments overview debriefing arcane matters turn acquaintances esteemed directives social

    Research Ideas

    • Training language models for improving accuracy in natural language processing applications such as question answering or dialogue systems.
    • Generating new grade school math questions and answers using g...
  14. Sales and Satisfaction

    • kaggle.com
    zip
    Updated May 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matin Mahmoudi ✨ (2024). Sales and Satisfaction [Dataset]. https://www.kaggle.com/datasets/matinmahmoudi/sales-and-satisfaction
    Explore at:
    zip(687693 bytes)Available download formats
    Dataset updated
    May 22, 2024
    Authors
    Matin Mahmoudi ✨
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    One dataset contains missing values (NaNs) and the other does not. These datasets contain information on sales and customer satisfaction before and after an intervention, as well as purchase data for control and treatment groups. The dataset is synthetic and was created for use in statistical analysis.

    This is an original dataset.

    Features

    Group - Description: Indicates whether the data point belongs to the Control or Treatment group. - Categories: Control, Treatment

    Customer_Segment - Description: Categorizes customers based on their value. - Categories: High Value, Medium Value, Low Value

    Sales_Before - Description: Sales figures before the intervention. - Data Type: Numerical

    Sales_After - Description: Sales figures after the intervention. - Data Type: Numerical

    Customer_Satisfaction_Before - Description: Customer satisfaction scores before the intervention. - Data Type: Numerical

    Customer_Satisfaction_After - Description: Customer satisfaction scores after the intervention. - Data Type: Numerical

    Purchase_Made - Description: Indicates whether a purchase was made after the intervention. - Categories: Yes, No

  15. Social Media Usage and User Behavior

    • kaggle.com
    zip
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIMRAN DESAI (2025). Social Media Usage and User Behavior [Dataset]. https://www.kaggle.com/datasets/simrandesai1616/social-media-behavior
    Explore at:
    zip(9184 bytes)Available download formats
    Dataset updated
    Jan 8, 2025
    Authors
    SIMRAN DESAI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset comes from a Social Media Analysis survey that aims to analyse user behavior on social media, focusing on attention monetization and engagement based on 110+ self-reported responses. It was conducted using Google Forms, with diverse participants to capture varying user profiles and the variance in levels of awareness about social media's impact on daily routines.

  16. 60k-data-with-context-v2

    • kaggle.com
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Deotte (2023). 60k-data-with-context-v2 [Dataset]. https://www.kaggle.com/datasets/cdeotte/60k-data-with-context-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chris Deotte
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset can be used to train an Open Book model for Kaggle's LLM Science Exam competition. This dataset was generated by searching and concatenating all publicly shared datasets on Sept 1 2023.

    The context column was generated using Mgoksu's notebook here with NUM_TITLES=5 and NUM_SENTENCES=20

    The source column indicates where the dataset originated. Below are the sources:

    source = 1 & 2 * Radek's 6.5k dataset. Discussion here annd here, dataset here.

    source = 3 & 4 * Radek's 15k + 5.9k. Discussion here and here, dataset here

    source = 5 & 6 * Radek's 6k + 6k. Discussion here and here, dataset here

    source = 7 * Leonid's 1k. Discussion here, dataset here

    source = 8 * Gigkpeaeums 3k. Discussion here, dataset here

    source = 9 * Anil 3.4k. Discussion here, dataset here

    source = 10, 11, 12 * Mgoksu 13k. Discussion here, dataset here

  17. LMS Tracking Dataset

    • kaggle.com
    zip
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). LMS Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/lms-tracking-dataset
    Explore at:
    zip(5419 bytes)Available download formats
    Dataset updated
    May 6, 2024
    Authors
    Prasad Patil
    Description

    This dataset was collected by a edtech startup. The startup is into teaching entrepreneurial life-skills in animated-gamified format through its video series to kids between the age group of 6-14 years. Through its learning management system the company tracks the progress made by all of its subscribers on the platform. Company records platform content usage activity data and tries to follow up with parents if there is any inactiveness on the platform by their child. Here's more information about the dataset

    Dataset Information:

    • Child Name: Name of the subscriber kid
    • Email Address: Email address created by parent
    • Contact: Contact details of the parent
    • follow up: Responses received by the company employee after progress follow-up over the phone.
    • response: segregating the follow-up responses in to categories
    • Introduction: Tutorial 1
    • Activity:- Know your personality, a fun way:Tutorial 2
    • A Simple Quiz on the previous Video: Quiz on the Tutorial 2
    • Lets see what ‘Product’ is…:Tutorial 3
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 3
    • Product that represents me: Tutorial 4
    • Let's see what 'Service' means: Tutorial 5
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 5
    • Instruction for 'Product & Service' worksheet:Tutorial 6
    • Activity:- Product and Service Worksheet: Exercise on Tutorial 6
    • Instructions for Product Word Association:Tutorial 7
    • Activity:- Product Word Association:Exercise on Tutorial 7
    • Life without products??.... Impossible !:Tutorial 8
    • What Is a Need?:Tutorial 9
    • A Simple Quiz on the previous Video:Quiz on the Tutorial 9
    • Summary of Session 1: Summarizing all the learnings from the Tutorials 1-9
    • Your Feedback on Session 1: Feedback page

    There is some missing data as well. I hope it would be good dataset for beginners practicing their NLP skills.

    Image by Steven Weirather from Pixabay

    Note: This dataset is partially synthetic meaning names, email and contact details mentioned are not of the actual customers. Kindly use it for educational and research purposes.

  18. Mental Health

    • kaggle.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahdi Mashayekhi (2025). Mental Health [Dataset]. https://www.kaggle.com/datasets/mahdimashayekhi/mental-health
    Explore at:
    zip(137847 bytes)Available download formats
    Dataset updated
    May 7, 2025
    Authors
    Mahdi Mashayekhi
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    📘 Dataset Description

    This dataset provides a realistic, synthetic simulation of global mental health survey responses from 10,000 individuals. It was created to reflect actual patterns seen in workplace mental health data while ensuring full anonymity and privacy.

    🧠 Context & Purpose

    Mental health issues affect people across all ages, countries, and industries. Understanding patterns in mental health at work, access to treatment, and stigma around disclosure is essential for shaping better workplace policies and interventions.

    This dataset is ideal for:

    • Training and evaluating machine learning models
    • Practicing classification or clustering techniques
    • Performing exploratory data analysis (EDA)
    • Studying fairness and bias in mental health predictions
    • Creating realistic dashboards for HR analytics or healthcare systems

    📊 Dataset Highlights

    • 10,000 rows representing anonymized individuals
    • Diverse global coverage with country/state info
    • Demographic attributes like age, gender, employment type
    • Information about work environment and company support
    • Responses about mental health history, treatment, and workplace stigma

    💡 Example Use Cases

    • Predicting the likelihood of an employee seeking mental health treatment
    • Identifying factors most correlated with workplace stress
    • Segmenting users by mental health risk using clustering
    • Building fairness-aware models to reduce bias in mental health predictions

    ⚠️ Notes

    • This dataset is entirely synthetic. No personally identifiable information (PII) or real user data is included.
    • It was generated based on patterns observed in public mental health datasets and surveys.
  19. Reddit /r/datasets Dataset

    • kaggle.com
    zip
    Updated Nov 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit /r/datasets Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-meta-corpus-of-datasets-the-reddit-dataset
    Explore at:
    zip(9619636 bytes)Available download formats
    Dataset updated
    Nov 28, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Meta-Corpus of Datasets: The Reddit Dataset

    The Complete Collection of Datasets Posted on Reddit

    By SocialGrep [source]

    About this dataset

    A subreddit dataset is a collection of posts and comments made on Reddit's /r/datasets board. This dataset contains all the posts and comments made on the /r/datasets subreddit from its inception to March 1, 2022. The dataset was procured using SocialGrep. The data does not include usernames to preserve users' anonymity and to prevent targeted harassment

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset, you will need to have a text editor such as Microsoft Word or LibreOffice installed on your computer. You will also need a web browser such as Google Chrome or Mozilla Firefox.

    Once you have the necessary software installed, open the The Reddit Dataset folder and double-click on the the-reddit-dataset-dataset-posts.csv file to open it in your preferred text editor.

    In the document, you will see a list of posts with the following information for each one: title, sentiment, score, URL, created UTC, permalink, subreddit NSFW status, and subreddit name.

    You can use this information to analyze trends in data sets posted on /r/datasets over time. For example, you could calculate the average score for all posts and compare it to the average score for posts in specific subReddits. Additionally, sentiment analysis could be performed on the titles of posts to see if there is a correlation between positive/negative sentiment and upvotes/downvotes

    Research Ideas

    • Finding correlations between different types of datasets
    • Determining which datasets are most popular on Reddit
    • Analyzing the sentiments of post and comments on Reddit's /r/datasets board

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: the-reddit-dataset-dataset-comments.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | body | The body of the post. (String) | | sentiment | The sentiment of the post. (String) | | score | The score of the post. (Integer) |

    File: the-reddit-dataset-dataset-posts.csv | Column name | Description | |:-------------------|:---------------------------------------------------| | type | The type of post. (String) | | subreddit.name | The name of the subreddit. (String) | | subreddit.nsfw | Whether or not the subreddit is NSFW. (Boolean) | | created_utc | The time the post was created, in UTC. (Timestamp) | | permalink | The permalink for the post. (String) | | score | The score of the post. (Integer) | | domain | The domain of the post. (String) | | url | The URL of the post. (String) | | selftext | The self-text of the post. (String) | | title | The title of the post. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit SocialGrep.

  20. University FAQ Dataset

    • kaggle.com
    zip
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andre Avindra (2023). University FAQ Dataset [Dataset]. https://www.kaggle.com/datasets/andreavindra/university-faq-dataset
    Explore at:
    zip(8402 bytes)Available download formats
    Dataset updated
    Jul 4, 2023
    Authors
    Andre Avindra
    Description

    The dataset is created for a chatbot using deep learning and NLP. This dataset can be used as an input to train deep learning models with NLP techniques, such as natural language processing and deep learning algorithms like neural networks, to develop a chatbot that can understand user conversation patterns and provide appropriate responses.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Trrishan (2022). Top 1000 Kaggle Datasets [Dataset]. https://www.kaggle.com/datasets/notkrishna/top-1000-kaggle-datasets
Organization logo

Top 1000 Kaggle Datasets

Kaggle's most popular datasets

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(34269 bytes)Available download formats
Dataset updated
Jan 3, 2022
Authors
Trrishan
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

From wiki

Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

Kaggle got its start in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard. Nicholas Gruen was founding chair succeeded by Max Levchin. Equity was raised in 2011 valuing the company at $25 million. On 8 March 2017, Google announced that they were acquiring Kaggle.[1][2]

Source: Kaggle

Search
Clear search
Close search
Google apps
Main menu