100+ datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(146671216621 bytes)Available download formats
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
Spain Job Offers Scraped Data
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Spain Job Offers Scraped Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/spain-job-offers-scraped-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Spain
Description
Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

By [source]

About this dataset

This dataset contains valuable web scraping information about job offers located in Spain, and gives details such as the offer name, company, location, and time of offer to potential employers. Having this knowledge is incredibly beneficial for any job seeker looking to target potential employers in Spain, understand the qualifications and requirements needed to be considered for a role and know approximately how long an offer is likely to stay on Linkedin. This dataset can also be extremely useful for recruiters who need a detailed overview of all job offers currently active in the Spanish market in order to filter out relevant vacancies. Lastly, professionals who have an eye on the Spanish job market can especially benefit from this dataset as it provides useful insights that can help optimise their search even more. This dataset consequently makes it easy for users interested in uncovering opportunities within Spain’s labour landscape with access detailed information about current job opportunities at their fingertips

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will help those looking to use this dataset to discover the job market in Spain. The data provided in the dataset can be a great starting point for people who want to optimize their job search and uncover potential opportunities available.

Understand What Is Being Measured:The dataset contains details such as a job offer name, company, and location along with other factors such as time of offer and type of schedule asked. It is important to understand what each column represents before using the data set.

Number of Job Offers Available:This dataset provides an insight on how many job offers are available throughout Spain by showing which areas have a high number of jobs listed and what types of jobs are needed in certain areas or businesses. This information could be used for expanding your career or for searching for specific jobs within different regions in Spain that match your skillset or desired salary range .

Required Qualifications & Skill Set:The type of schedule being asked by businesses is also mentioned, allowing users to understand if certain employers require multiple shifts, weekend work or hours outside the normal 9 - 5 depending on positions needed within companies located throughout the country . Additionally, understanding what skills sets are required not only quality you prioritize when learning new technologies or gaining qualifications but can give you an idea about what other soft skills may be required by businesses like team work , communication etc..

Location Opportunities:This web scraping list allows users to gain access into potential companies located throughout Spain such as Madrid , Barcelona , Valencia etc.. By understanding where business demand exists across different regions one could look at taking up new roles with higher remuneration , specialize more closely in recruitments/searches tailored specifically towards various regions around Spain .

By following this guide, you should now have a robust understanding about how best utilize this dataset obtained from UOC along with an increased knowledge on identifying job opportunities available through webscraping for those seeking work experience/positions across multiple regions within the country

Research Ideas

Analyzing the job market in Spain - Companies offering jobs can be compared and contrasted using this dataset, such as locations of where they are looking to hire, types of schedules they offer, length of job postings, etc. This information can let users to target potential employers instead of wasting time randomly applying for jobs online.

Optimizing a Job Search- Web scraping allows users to quickly gather job postings from all sources on a daily basis and view relevant qualifications and requirements needed for each post in order to better optimize their job search process.

Leveraging data insights – Insights collected by analyzing this web scraping dataset can be used for strategic advantage when creating LinkedIn or recruitment campaigns targeting Spanish markets based on the available applicants’ preferences – such as hours per week or area/position within particular companies typically offered in the datas set available from UOC

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

L...
📣 Ad Click Prediction Dataset
kaggle.com
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ciobanu Marius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About

This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

Features

id: Unique identifier for each user.

full_name: User's name formatted as "UserX" for anonymity.

age: Age of the user (ranging from 18 to 64 years).

gender: The gender of the user (categorized as Male, Female, or Non-Binary).

device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).

ad_position: The position of the ad on the webpage (Top, Side, Bottom).

browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).

time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).

click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

Goal

The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.
🎸🎹🎙️Speakers Sales Conversion Dataset🎸🎹🎙️
kaggle.com
Updated Mar 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandeep SD (2025). 🎸🎹🎙️Speakers Sales Conversion Dataset🎸🎹🎙️ [Dataset]. https://www.kaggle.com/datasets/sandeep1080/bassburst
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sandeep SD
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🌟 Enjoying the Dataset? 🌟

If this dataset helped you uncover new insights or make your day a little brighter. Thanks a ton for checking it out! Let’s keep those insights rolling! 🔥📈

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F23961675%2Ff3761bd2d7ee460ad464de8f25634f63%2Fsteve-johnson-z6LlNgsDeug-unsplash.jpg?generation=1740481184467263&alt=media" alt="">

Dataset Description:

This dataset contains website conversion data for Bluetooth speaker sales. The dataset tracks user sessions on different landing page variants, with the primary goal of analyzing conversion rates, user behavior, and other factors influencing sales. It includes detailed user engagement metrics such as time spent, pages visited, device type, sign-in methods, and geographical information.

Use Case:

This dataset can be used for various analytical tasks including:

A/B testing and multivariate analysis to compare landing page designs.
User segmentation by demographics (age, gender, location, etc.).
Conversion rate optimization (CRO) analysis.
Predictive modeling for conversion likelihood based on session characteristics.
Revenue and payment analysis.
Gemma-Data Science Agent- Instruct- Dataset
kaggle.com
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ian cecil akoto (2024). Gemma-Data Science Agent- Instruct- Dataset [Dataset]. https://www.kaggle.com/datasets/ianakoto/gemma-data-science-agent-instruct-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ian cecil akoto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview This dataset contains question-answer pairs with context extracted from Kaggle solution write-ups and discussion forums. The dataset was created to facilitate fine-tuning Gemma, an AI model, for data scientist assistant tasks such as question answering and providing data science assistance.

Dataset Details Columns: Question: The question generated based on the context extracted from Kaggle solution write-ups and discussion forums. Answer: The corresponding answer to the generated question. Context: The context extracted from Kaggle solution write-ups and discussion forums, which serves as the basis for generating questions and answers. Subtitle: Subtitle or additional information related to the Kaggle competition or topic. Title: Title of the Kaggle competition or topic. Sources and Inspiration

Sources:

Meta Kaggle: The dataset was sourced from Meta Kaggle, an official Kaggle platform where users discuss competitions, kernels, datasets, and more. Kaggle Solution Write-ups: Solution write-ups submitted by Kaggle users were utilized as a primary source of context for generating questions and answers. Discussion Forums: Discussion threads on Kaggle forums were used to gather additional insights and context for the dataset. Inspiration:

The dataset was inspired by the need for a specialized dataset tailored for fine-tuning Gemma, an AI model designed for data scientist assistant tasks. The goal was to create a dataset that captures the essence of real-world data science problems discussed on Kaggle, enabling Gemma to provide accurate and relevant assistance to data scientists and Kaggle users. Dataset Specifics Total Records: [Specify the total number of question-answer pairs in the dataset] Format: CSV (Comma Separated Values) Size: [Specify the size of the dataset in MB or GB] License: [Specify the license under which the dataset is distributed, e.g., CC BY-SA 4.0] Download Link: [Provide a link to download the dataset] Acknowledgments We acknowledge Kaggle and its community for providing valuable data science resources and discussions that contributed to the creation of this dataset. We appreciate the efforts of Gemma and Langchain in fine-tuning AI models for data scientist assistant tasks, enabling enhanced productivity and efficiency in the field of data science.

Machine Learning Job Postings in the US

kaggle.com
opendatabay.com

Updated Apr 20, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ivan Kumeyko (2025). Machine Learning Job Postings in the US [Dataset]. https://www.kaggle.com/datasets/ivankmk/thousand-ml-jobs-in-usa

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 20, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ivan Kumeyko

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Description

This dataset contains 1,000 job postings for Machine Learning-related roles across the United States, scraped between late 2024 and early 2025. The data was collected directly from company career pages and job boards, focusing on full job descriptions and associated company information.

Column Descriptions

Column	Description
`job_posted_date`	The date the job was posted (format: YYYY-MM-DD).
`company_address_locality`	The city or locality of the job or company.
`company_address_region`	The U.S. state or region where the job is located.
`company_name`	The name of the company posting the job.
`company_website`	The official website of the company.
`company_description`	A short description or mission statement of the company.
`job_description_text`	The full job description text as listed in the original posting.
`seniority_level`	The required seniority level (e.g., Internship, Entry level, Mid-Senior).
`job_title`	The full job title listed in the posting.

High School Heights Dataset
kaggle.com
Updated Aug 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yashmeet Singh (2022). High School Heights Dataset [Dataset]. https://www.kaggle.com/datasets/yashmeetsingh/high-school-heights-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yashmeet Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
High School Heights Dataset

You will find three datasets containing heights of the high school students.

All heights are in inches.

The data is simulated. The heights are generated from a normal distribution with different sets of mean and standard deviation for boys and girls.

Height Statistics (inches) Boys Girls
Mean 67 62
Standard Deviation 2.9 2.2

There are 500 measurements for each gender.

Here are the datasets:

hs_heights.csv: contains a single column with heights for all boys and girls. There's no way to tell which of the values are for boys and which ones are for girls.

hs_heights_pair.csv: has two columns. The first column has boy's heights. The second column contains girl's heights.

hs_heights_flag.csv: has two columns. The first column has the flag is_girl. The second column contains a girl's height if the flag is 1. Otherwise, it contains a boy's height.

To see how I generated this dataset, check this out: https://github.com/ysk125103/datascience101/tree/main/datasets/high_school_heights

Image by Gillian Callison from Pixabay
Occupational Skills and Tasks
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Occupational Skills and Tasks [Dataset]. https://www.kaggle.com/datasets/thedevastator/occupational-skills-and-tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Occupational Skills and Tasks

Understanding the Role of Skills in Online Job Ads

By [source]

About this dataset

This dataset provides an invaluable resource to better understand the connection between occupational skills and related tasks associated with them. Drawing from online job advertisements, it reflects how the range of skills and tasks an individual needs to have within a job role changes over time. The data has been reconciled with the JRC-Eurofound Task Taxonomy, making this dataset a powerful tool for researchers who are looking to understand an occupation's profile and competency requirements. This includes two columns SKILL and TASK which provide descriptors that have been reconciled with the Task Taxonomy respective to their positions respectively. With such insights found in this data, one can not only recognize skilled-based jobs along bettering their hiring practices but also facilitate a more holistic understanding of talent identification during modern recruitment processes

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Get familiar with the two columns - SKILL and TASK. The SKILL column describes skill descriptors found in online job advertisements that have been reconciled with the JRC-Eurofound Task Taxonomy, whilst TASK provides the task for each skill description entry.

Explore how different occupations rely on different sets of skills/tasks or look into trends over time by examining datasets from different years or by filtering them by type/labour market.

Consider utilizing data visualization techniques like heat maps in order to more easily recognize patterns in large data sets such as those found in this dataset

Make sure you check out other similar datasets available on kaggle's platform (e.g., Education, Professional Background), as they may have useful connections or overlap with this one based on common data points like geography/location, occupation type etc..

By following these tips you’ll be able to benefit more fully from this great resource!

Research Ideas

Analyzing the correlation between specific jobs and growth rate of certain skills over time.

Examining how certain skills may be trending in a particular job market or industry sector.

Comparing and contrasting occupational skill profiles between different professions or geographical locations to better allocate resources appropriately for hiring and training purposes

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: skill_task_dictionary.csv | Column name | Description | |:--------------|:------------------------------------------------------------| | SKILL | A description of the skill required for the job. (Text) | | TASK | A description of the task associated with the skill. (Text) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Datasets for federated learning
kaggle.com
Updated Dec 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wonghoitin (2022). Datasets for federated learning [Dataset]. https://www.kaggle.com/datasets/wonghoitin/datasets-for-federated-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
wonghoitin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Federated learning is to build machine learning models based on data sets that are distributed across multiple devices while preventing data leakage.(Q. Yang et al. 2019)

source:

smoking https://www.kaggle.com/datasets/kukuroo3/body-signal-of-smoking license = CC0: Public Domain

heart https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset license = CC0: Public Domain

water https://www.kaggle.com/datasets/adityakadiwal/water-potability license = CC0: Public Domain

customer https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis license = CC0: Public Domain

insurance https://www.kaggle.com/datasets/tejashvi14/travel-insurance-prediction-data license = CC0: Public Domain

credit https://www.kaggle.com/datasets/ajay1735/hmeq-data license = CC0: Public Domain

income https://www.kaggle.com/datasets/mastmustu/income license = CC0: Public Domain

machine https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification license: CC0: Public Domain

skin https://www.kaggle.com/datasets/saurabhshahane/lumpy-skin-disease-dataset license = Attribution 4.0 International (CC BY 4.0)

score https://www.kaggle.com/datasets/parisrohan/credit-score-classification?select=train.csv license = CC0: Public Domain
SMILES DataSet for Analysis & Prediction Dataset
kaggle.com
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Maksi (2023). SMILES DataSet for Analysis & Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/yanmaksi/big-molecules-smiles-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yan Maksi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F9770082%2Fb234dd748f233e4d3ef1d72d048828b5%2FMastering%20Drug%20Design.jpg?generation=1686502308761641&alt=media" alt="">

Read this article to get unlock the wonderful world Deep Reinforcement Learning for Drug Design

ReLeaSE is a public dataset, consisting of molecular structures and their corresponding binding affinity to proteins. The dataset was created for the purpose of evaluating and comparing machine learning models for the prediction of protein-ligand binding affinity.

The dataset contains a total of 10,000 molecules and their binding affinity to several target proteins, including thrombin, kinase, and protease. The molecular structures are represented using Simplified Molecular Input Line Entry System (SMILES) notation, which is a standardized method for representing molecular structures as a string of characters. The binding affinity is represented as a negative logarithm of the dissociation constant (pKd), which is a measure of the strength of the interaction between the molecule and the target protein.

The ReLeaSE dataset provides a standardized benchmark for evaluating machine learning models for protein-ligand binding affinity prediction. The dataset is publicly available and can be used for research purposes, making it an important resource for the drug discovery community.
Global Startup Success Dataset
kaggle.com
Updated Mar 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamna Kaleem (2025). Global Startup Success Dataset [Dataset]. https://www.kaggle.com/datasets/hamnakaleemds/global-startup-success-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hamna Kaleem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📊 Dataset Features This dataset includes 5,000 startups from 10 countries and contains 15 key features: Startup Name: Name of the startup Founded Year: Year the startup was founded Country: Country where the startup is based Industry: Industry category (Tech, FinTech, AI, etc.) Funding Stage: Stage of investment (Seed, Series A, etc.) Total Funding ($M): Total funding received (in million $) Number of Employees: Number of employees in the startup Annual Revenue ($M): Annual revenue in million dollars Valuation ($B): Startup's valuation in billion dollars Success Score: Score from 1 to 10 based on growth Acquired?: Whether the startup was acquired (Yes/No) IPO?: Did the startup go public? (Yes/No) Customer Base (Millions): Number of active customers Tech Stack: Technologies used by the startup Social Media Followers: Total followers on social platforms Analysis Ideas 📈 What Can You Do with This Dataset? Here are some exciting analyses you can perform:

Predict Startup Success: Train a machine learning model to predict the success score. Industry Trends: Analyze which industries get the most funding. **Valuation vs. Funding: **Explore the correlation between funding and valuation. Acquisition Analysis: Investigate the factors that contribute to startups being acquired.
home data for ml course
kaggle.com
zip
Updated Aug 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julián Pérez Pesce (2019). home data for ml course [Dataset]. https://www.kaggle.com/datasets/estrotococo/home-data-for-ml-course
Explore at:
zip(199207 bytes)Available download formats
Dataset updated
Aug 27, 2019
Authors
Julián Pérez Pesce
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Exercise: Machine Learning Competitions

When you click on Run / All, the notebook will give you an error: "Files doesn't exist" With this DataSet you fix that. It's the same from DanB. Please UPVOTE!

Enjoy!
Mental Health Dataset
kaggle.com
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhavik Jikadara (2024). Mental Health Dataset [Dataset]. https://www.kaggle.com/datasets/bhavikjikadara/mental-health-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bhavik Jikadara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset appears to contain a variety of features related to text analysis, sentiment analysis, and psychological indicators, likely derived from posts or text data. Some features include readability indices such as Automated Readability Index (ARI), Coleman Liau Index, and Flesch-Kincaid Grade Level, as well as sentiment analysis scores like sentiment compound, negative, neutral, and positive scores. Additionally, there are features related to psychological aspects such as economic stress, isolation, substance use, and domestic stress. The dataset seems to cover a wide range of linguistic, psychological, and behavioural attributes, potentially suitable for analyzing mental health-related topics in online communities or text data.

Benefits of using this dataset:

Insight into Mental Health: The dataset provides valuable insights into mental health by analyzing linguistic patterns, sentiment, and psychological indicators in text data. Researchers and data scientists can gain a better understanding of how mental health issues manifest in online communication.

Predictive Modeling: With a wide range of features, including sentiment analysis scores and psychological indicators, the dataset offers opportunities for developing predictive models to identify or predict mental health outcomes based on textual data. This can be useful for early intervention and support.

Community Engagement: Mental health is a topic of increasing importance, and this dataset can foster community engagement on platforms like Kaggle. Data enthusiasts, researchers, and mental health professionals can collaborate to analyze the data and develop solutions to address mental health challenges.

Data-driven Insights: By analyzing the dataset, users can uncover correlations and patterns between linguistic features, sentiment, and mental health indicators. These insights can inform interventions, policies, and support systems aimed at promoting mental well-being.

Educational Resource: The dataset can serve as a valuable educational resource for teaching and learning about mental health analytics, sentiment analysis, and text mining techniques. It provides a real-world dataset for students and practitioners to apply data science skills in a meaningful context.
Medium-Search-Dataset
kaggle.com
Updated Jun 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keegan Fernandes (2021). Medium-Search-Dataset [Dataset]. https://www.kaggle.com/datasets/aristotle609/mediumsearchdataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Keegan Fernandes
Description
Context

Since I started Blogging on medium.com (Here's a shameless plug )I Haven't really had many views (Granted my posts aren't that great and publishing frequency is low) but I've wondered what differentiates the top Medium Data Science Bloggers from me so I decided to make a dataset to find it and improve myself (I found a lot to improve upon)😃

Content

The Data Represents the Top 200 Medium Articles for each specific Query. The data was acquired through web scraping and contains various metadata about the post barring the blog text data which I will upload in a separate Dataset.

Acknowledgements

The thought of web scraping was pretty daunting to me the coding, the time and data required would be a lot. It is then that I discovered ParseHub Which Allowed me to make me to scrape websites with ease they also ran the WebScraping on Their servers all this for free (with a limit). WebScraping is a Important Method in Data Science to Collect Data I would recommend everyone Give Parsehub a try.

Inspiration

Hopefully this will give all the struggling bloggers on Kaggle some insight.
Student Habits vs Academic Performance
kaggle.com
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jayanta Nath (2025). Student Habits vs Academic Performance [Dataset]. https://www.kaggle.com/datasets/jayaantanaath/student-habits-vs-academic-performance
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jayanta Nath
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is a simulated dataset exploring how lifestyle habits affect academic performance in students. With 1,000 synthetic student records and 15+ features including study hours, sleep patterns, social media usage, diet quality, mental health, and final exam scores, it’s perfect for ML projects, regression analysis, clustering, and data viz. Created using realistic patterns for educational practice.

Ever wondered how much Netflix, sleep, or TikTok scrolling affects your grades? 👀 This dataset simulates 1,000 students' daily habits—from study time to mental health—and compares them to final exam scores. It's like spying on your GPA through the lens of lifestyle. Perfect for EDA, ML practice, or just vibing with data while pretending to be productive.
Dataset : Business Intelligence Research Trends
kaggle.com
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MUHAMMAD AKMAL HAKIM (2024). Dataset : Business Intelligence Research Trends [Dataset]. http://doi.org/10.34740/kaggle/ds/5643554
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/5643554
Dataset updated
Nov 14, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MUHAMMAD AKMAL HAKIM
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset Overview

This dataset presents a meticulously compiled collection of 387 academic publications that explore various aspects of social media and business intelligence. The dataset includes detailed metadata about each publication, such as titles, authorship, abstracts, publication years, article types, and the journals or conferences where they were published. Citations and research areas are also included, making this dataset a valuable resource for bibliometric analysis, trend detection, and literature reviews in the fields of social media analytics, sentiment analysis, business intelligence, and related disciplines.

Content

The dataset comprises 15 columns, each capturing specific attributes of the research papers. Below is a description of each column:

ID: A unique identifier for each record in the dataset.

Title: The title of the academic paper.

DOI: The Digital Object Identifier, which provides a permanent link to the publication.

Author: List of authors who contributed to the paper.

Abstract: A summary of the research paper, providing insights into the study's objectives, methods, and findings.

Year: The year the paper was published.

Article Type: Indicates the type of publication (e.g., Proceedings Paper, Article, Book Chapter).

Publication Name: The name of the journal or conference where the paper was published.

Number-of-Cited-References: The number of references cited in the paper.

Times Cited: The number of times the paper has been cited by other works.

Research Areas: The general research area(s) the paper pertains to (e.g., Computer Science, Engineering).

WOS Category: Specific categories or subfields relevant to Web of Science classification.

WOS Index: The index within Web of Science where the paper is listed.

Keywords: Keywords provided by the authors to describe the main topics of the paper.

Keyword Plus: Additional keywords derived from the titles of the paper’s cited references.

Applications

This dataset can be utilized for a variety of purposes, including but not limited to:

Trend Analysis: Identify emerging trends and popular topics in social media and business intelligence research.

Citation Analysis: Analyze citation patterns to determine the impact and relevance of specific publications.

Collaborative Networks: Map out authorship and institutional collaboration trends.

Text Mining: Perform text mining on abstracts and keywords to uncover latent themes and topics.

Research Evaluation: Conduct bibliometric evaluations to assess the productivity and impact of researchers and institutions in the field.

Data Collection and Preprocessing

The dataset was curated by extracting bibliometric data from Web of Science (WOS), ensuring the inclusion of comprehensive and high-quality metadata. All records have been standardized for consistency and completeness to facilitate easier analysis.
School Learning Modalities, 2020-2021
kaggle.com
healthdata.gov
+3more
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Javad Babaei (2023). School Learning Modalities, 2020-2021 [Dataset]. https://www.kaggle.com/datasets/mhmtaha/school-learning-modalities-2020-2021
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 26, 2023
Dataset provided by
Kaggle
Authors
Mohamad Javad Babaei
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
The 2020-2021 School Learning Modalities dataset provides weekly estimates of school learning modality (including in-person, remote, or hybrid learning) for U.S. K-12 public and independent charter school districts for the 2020-2021 school year, from August 2020 – June 2021.

These data were modeled using multiple sources of input data (see below) to infer the most likely learning modality of a school district for a given week. These data should be considered district-level estimates and may not always reflect true learning modality, particularly for districts in which data are unavailable. If a district reports multiple modality types within the same week, the modality offered for the majority of those days is reflected in the weekly estimate. All school district metadata are sourced from the National Center for Educational Statistics (NCES) for 2020-2021.

School learning modality types are defined as follows:

In-Person: All schools within the district offer face-to-face instruction 5 days per week to all students at all available grade levels. Remote: Schools within the district do not offer face-to-face instruction; all learning is conducted online/remotely to all students at all available grade levels. Hybrid: Schools within the district offer a combination of in-person and remote learning; face-to-face instruction is offered less than 5 days per week, or only to a subset of students.

Data Information

School learning modality data provided here are model estimates using combined input data and are not guaranteed to be 100% accurate. This learning modality dataset was generated by combining data from four different sources: Burbio [1], MCH Strategic Data [2], the AEI/Return to Learn Tracker [3], and state dashboards [4-20]. These data were combined using a Hidden Markov model which infers the sequence of learning modalities (In-Person, Hybrid, or Remote) for each district that is most likely to produce the modalities reported by these sources. This model was trained using data from the 2020-2021 school year. Metadata describing the location, number of schools and number of students in each district comes from NCES [21]. You can read more about the model in the CDC MMWR: COVID-19–Related School Closures and Learning Modality Changes — United States, August 1–September 17, 2021. The metrics listed for each school learning modality reflect totals by district and the number of enrolled students per district for which data are available. School districts represented here exclude private schools and include the following NCES subtypes:

Public school district that is NOT a component of a supervisory union Public school district that is a component of a supervisory union Independent charter district

“BI” in the state column refers to school districts funded by the Bureau of Indian Education.

Technical Notes

Data from September 1, 2020 to June 25, 2021 correspond to the 2020-2021 school year. During this timeframe, all four sources of data were available. Inferred modalities with a probability below 0.75 were deemed inconclusive and were omitted. Data for the month of July may show “In Person” status although most school districts are effectively closed during this time for summer break. Users may wish to exclude July data from use for this reason where applicable.

Sources

K-12 School Opening Tracker. Burbio 2021; https
Synthetic Binary Classification Dataset! 💻
kaggle.com
Updated Feb 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Graupe (2023). Synthetic Binary Classification Dataset! 💻 [Dataset]. https://www.kaggle.com/datasets/kylegraupe/simulation-generated-dataset-binary-class/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 26, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kyle Graupe
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset is generated via random number generation and simulation methods. Try to predict the class using binary classification!

Simulation is an excellent way to augment or create datasets with distributions that are representative of real world phenomena! Finding enough data for your model can be difficult, but what if you knew the distribution of what a dataset should look like? Can we then generate a dataset using simulation upon which we can then train a model? Simulation is a promising method for solving this problem!

I will continue to release more content about simulation methods and applications, so stay tuned!
Data from: College Completion Dataset
kaggle.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). College Completion Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/boost-student-success-with-college-completion-da
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 6, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
College Completion Dataset

Graduation Rates, Race, Efficiency Measures and More

By Jonathan Ortiz [source]

About this dataset

This College Completion dataset provides an invaluable insight into the success and progress of college students in the United States. It contains graduation rates, race and other data to offer a comprehensive view of college completion in America. The data is sourced from two primary sources – the National Center for Education Statistics (NCES)’ Integrated Postsecondary Education System (IPEDS) and Voluntary System of Accountability’s Student Success and Progress rate.

At four-year institutions, the graduation figures come from IPEDS for first-time, full-time degree seeking students at the undergraduate level, who entered college six years earlier at four-year institutions or three years earlier at two-year institutions. Furthermore, colleges report how many students completed their program within 100 percent and 150 percent of normal time which corresponds with graduation within four years or six year respectively. Students reported as being of two or more races are included in totals but not shown separately

When analyzing race and ethnicity data NCES have classified student demographics since 2009 into seven categories; White non-Hispanic; Black non Hispanic; American Indian/ Alaskan native ; Asian/ Pacific Islander ; Unknown race or ethnicity ; Non resident with two new categorize Native Hawaiian or Other Pacific Islander combined with Asian plus students belonging to several races. Also worth noting is that different classifications for graduate data stemming from 2008 could be due to variations in time frame examined & groupings used by particular colleges – those who can’t be identified from National Student Clearinghouse records won’t be subjected to penalty by these locations .

When it comes down to efficiency measures parameters like “Awards per 100 Full Time Undergraduate Students which includes all undergraduate completions reported by a particular institution including associate degrees & certificates less than 4 year programme will assist us here while we also take into consideration measures like expenditure categories , Pell grant percentage , endowment values , average student aid amounts & full time faculty members contributing outstandingly towards instructional research / public service initiatives .

When trying to quantify outcomes back up Median Estimated SAT score metric helps us when it is derived either on 25th percentile basis / 75th percentile basis with all these factors further qualified by identifying required criteria meeting 90% threshold when incoming students are considered for relevance . Last but not least , Average Student Aid equalizes amount granted by institution dividing same over total sum received against what was allotted that particular year .

All this analysis gives an opportunity get a holistic overview about performance , potential deficits &

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains data on student success, graduation rates, race and gender demographics, an efficiency measure to compare colleges across states and more. It is a great source of information to help you better understand college completion and student success in the United States.

In this guide we’ll explain how to use the data so that you can find out the best colleges for students with certain characteristics or focus on your target completion rate. We’ll also provide some useful tips for getting the most out of this dataset when seeking guidance on which institutions offer the highest graduation rates or have a good reputation for success in terms of completing programs within normal timeframes.

Before getting into specifics about interpreting this dataset, it is important that you understand that each row represents information about a particular institution – such as its state affiliation, level (two-year vs four-year), control (public vs private), name and website. Each column contains various demographic information such as rate of awarding degrees compared to other institutions in its sector; race/ethnicity Makeup; full-time faculty percentage; median SAT score among first-time students; awards/grants comparison versus national average/state average - all applicable depending on institution location — and more!

When using this dataset, our suggestion is that you begin by forming a hypothesis or research question concerning student completion at a given school based upon observable characteristics like financ...

Height Statistics (inches)	Boys	Girls
Mean	67	62
Standard Deviation	2.9	2.2

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(146671216621 bytes)Available download formats

Dataset updated

Jun 26, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

Spain Job Offers Scraped Data

Spain Job Offers Scraped Data

Uncovering Qualifications and Requirements

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

L...

📣 Ad Click Prediction Dataset

🎸🎹🎙️Speakers Sales Conversion Dataset🎸🎹🎙️

Gemma-Data Science Agent- Instruct- Dataset

Machine Learning Job Postings in the US

Column Descriptions

High School Heights Dataset

High School Heights Dataset

Occupational Skills and Tasks

Occupational Skills and Tasks

Understanding the Role of Skills in Online Job Ads

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Datasets for federated learning

SMILES DataSet for Analysis & Prediction Dataset

Global Startup Success Dataset

home data for ml course

Mental Health Dataset

Benefits of using this dataset:

Medium-Search-Dataset

Context

Content

Acknowledgements

Inspiration

Student Habits vs Academic Performance

Dataset : Business Intelligence Research Trends

School Learning Modalities, 2020-2021

Synthetic Binary Classification Dataset! 💻

Data from: College Completion Dataset

College Completion Dataset

Graduation Rates, Race, Efficiency Measures and More

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Meta Kaggle Code

Kaggle's public data on notebook code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments