100+ datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(153009997518 bytes)Available download formats
Dataset updated
Aug 14, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Data Science Job Market
kaggle.com
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boltana MT (2025). Data Science Job Market [Dataset]. https://www.kaggle.com/datasets/misganawtboltana/data-science-job-market-in-2025-15k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 19, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Boltana MT
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Data Science job market has been expanding rapidly over the past few years, and projections for 2025 indicate that this growth will continue at an impressive pace. This dataset contains over 7,000 job opportunities in 2025, mainly gathered from India. However, it provides valuable insights into the skills in demand globally.

This dataset offers real-world insights into the latest in-demand skills such as Python, SQL, machine learning, and AI, helping data scientists navigate the evolving job market. It highlights key job trends, market-demanded skills, and location-based opportunities.

** If you find this dataset helpful, please don't forget to upvote **

Dataset Attributes:

Job Title: The position being offered (e.g., Data Scientist, Data Analyst). Company Name: The name of the hiring company. Location: Geographical location of the job (e.g., Chennai, Bengaluru). Experience: The required years of experience (e.g., 0-1 Years, 2-5 Years). Job Description: A brief description of the job role and responsibilities. Skills: The key technical and soft skills required for the job (e.g., Python, SQL, Machine Learning). Job Post Day: The date when the job was posted.
2023 Data Scientists Jobs Descriptions
kaggle.com
Updated Feb 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diego Silva França (2023). 2023 Data Scientists Jobs Descriptions [Dataset]. https://www.kaggle.com/datasets/diegosilvadefrana/2023-data-scientists-jobs-descriptions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Diego Silva França
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was obtained from the Google Jobs API through serpAPI and contains information about job offers for data scientists in companies based in the United States of America (USA). The data may include details such as job title, company name, location, job description, salary range, and other relevant information. The dataset is likely to be valuable for individuals seeking to understand the job market for data scientists in the USA and for companies looking to recruit data scientists. It may also be useful for researchers who are interested in exploring trends and patterns in the job market for data scientists. The data should be used with caution, as the API source may not cover all job offers in the USA and the information provided by the companies may not always be accurate or up-to-date.
Data Science Glossary For QA
kaggle.com
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofianesun (2024). Data Science Glossary For QA [Dataset]. https://www.kaggle.com/datasets/sofianesun/data-science-glossary-for-qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sofianesun
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
A dataset for the 1st task Explain or teach basic data science concepts of the competition Google – AI Assistants for Data Tasks with Gemma. This dataset contains several glossaries of Data Science, where every sample contains two keys term(vocab name) and definition.
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
US Data Science and Analytics Master's Programs
kaggle.com
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahriar Kabir (2024). US Data Science and Analytics Master's Programs [Dataset]. https://www.kaggle.com/datasets/shahriarkabir/us-data-science-and-analytics-masters-programs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shahriar Kabir
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides comprehensive information about various Data Science and Analytics master's programs offered in the United States. It includes details such as the program name, university name, annual tuition fees, program duration, location of the university, and additional information about the programs.

Column Descriptions:

Subject Name: The name or field of study of the master's program, such as Data Science, Data Analytics, or Applied Biostatistics.

University Name: The name of the university offering the master's program.

Per Year Fees: The tuition fees for the program, usually given in euros per year. For some programs, the fees may be listed as "full" or "full-time," indicating a lump sum for the entire program or for full-time enrollment, respectively.

About Program: A brief description or overview of the master's program, providing insights into its curriculum, focus areas, and any unique features.

Program Duration: The duration of the master's program, typically expressed in years or months.

University Location: The location of the university where the program is offered, including the city and state.

Program Name: The official name of the master's program, often indicating its degree type (e.g., M.Sc. for Master of Science) and format (e.g., full-time, part-time, online).
Student Engagement
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Student Engagement [Dataset]. https://www.kaggle.com/datasets/thedevastator/student-engagement-with-tableau-a-data-science-p
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Student Engagement

Predicting Engagement and Exam Performance

By [source]

About this dataset

This dataset contains information on student engagement with Tableau, including quizzes, exams, and lessons. The data includes the course title, the rating of the course, the date the course was rated, the exam category, the exam duration, whether the answer was correct or not, the number of quizzes completed, the number of exams completed, the number of lessons completed, the date engaged, the exam result, and more

How to use the dataset

The 'Student Engagement with Tableau' dataset offers insights into student engagement with the Tableau software. The data includes information on courses, exams, quizzes, and student learning.

This dataset can be used to examine how students use Tableau, what kind of engagement leads to better learning outcomes, and whether certain course or exam characteristics are associated with student engagement

Research Ideas

Creating a heat map of student engagement by course and location

Determining which courses are most popular among students from different countries

Identifying patterns in students' exam results

Acknowledgements

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: 365_course_info.csv | Column name | Description | |:-----------------|:----------------------------------| | course_title | The title of the course. (String) |

File: 365_course_ratings.csv | Column name | Description | |:------------------|:---------------------------------------------------------| | course_rating | The rating given to the course by the student. (Numeric) | | date_rated | The date on which the course was rated. (Date) |

File: 365_exam_info.csv | Column name | Description | |:------------------|:-------------------------------------------------| | exam_category | The category of the exam. (Categorical) | | exam_duration | The duration of the exam in minutes. (Numerical) |

File: 365_quiz_info.csv | Column name | Description | |:-------------------|:----------------------------------------------------------------------| | answer_correct | Whether or not the student answered the question correctly. (Boolean) |

File: 365_student_engagement.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | engagement_quizzes | The number of times a student has engaged with quizzes. (Numeric) | | engagement_exams | The number of times a student has engaged with exams. (Numeric) | | engagement_lessons | The number of times a student has engaged with lessons. (Numeric) | | date_engaged | The date of the student's engagement. (Date) |

File: 365_student_exams.csv | Column name | Description | |:-------------------------|:---------------------------------------------------| | exam_result | The result of the exam. (Categorical) | | exam_completion_time | The time it took to complete the exam. (Numerical) | | date_exam_completed | The date the exam was completed. (Date) |

File: 365_student_hub_questions.csv | Column name | Description | |:------------------------|:----------------------------------------| | date_question_asked | The date the question was asked. (Date) |

File: 365_student_info.csv | Column name | Description | |:--------------------|:-------------------------------------------------------| | student_country | The country of the student. (Categorical) | | date_registered | The date the student registered for the course. (Date) |

File: 365_student_learning.csv | Column name | Description | |:--------------------|:------------------------------...
Health Care Analytics
kaggle.com
Updated Jan 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abishek Sudarshan (2022). Health Care Analytics [Dataset]. https://www.kaggle.com/datasets/abisheksudarshan/health-care-analytics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abishek Sudarshan
Description
Context

Part of Janatahack Hackathon in Analytics Vidhya

Content

The healthcare sector has long been an early adopter of and benefited greatly from technological advances. These days, machine learning plays a key role in many health-related realms, including the development of new medical procedures, the handling of patient data, health camps and records, and the treatment of chronic diseases.

MedCamp organizes health camps in several cities with low work life balance. They reach out to working people and ask them to register for these health camps. For those who attend, MedCamp provides them facility to undergo health checks or increase awareness by visiting various stalls (depending on the format of camp).

MedCamp has conducted 65 such events over a period of 4 years and they see a high drop off between “Registration” and number of people taking tests at the Camps. In last 4 years, they have stored data of ~110,000 registrations they have done.

One of the huge costs in arranging these camps is the amount of inventory you need to carry. If you carry more than required inventory, you incur unnecessarily high costs. On the other hand, if you carry less than required inventory for conducting these medical checks, people end up having bad experience.

The Process:

MedCamp employees / volunteers reach out to people and drive registrations. During the camp, People who “ShowUp” either undergo the medical tests or visit stalls depending on the format of health camp.

Other things to note:

Since this is a completely voluntary activity for the working professionals, MedCamp usually has little profile information about these people. For a few camps, there was hardware failure, so some information about date and time of registration is lost. MedCamp runs 3 formats of these camps. The first and second format provides people with an instantaneous health score. The third format provides information about several health issues through various awareness stalls.

Favorable outcome:

For the first 2 formats, a favourable outcome is defined as getting a health_score, while in the third format it is defined as visiting at least a stall. You need to predict the chances (probability) of having a favourable outcome.

Train / Test split:

Camps started on or before 31st March 2006 are considered in Train Test data is for all camps conducted on or after 1st April 2006.

Acknowledgements

Credits to AV

Inspiration

To share with the data science community to jump start their journey in Healthcare Analytics
Practical Statistics for Data Science
kaggle.com
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vaishnavi Hemadri (2025). Practical Statistics for Data Science [Dataset]. https://www.kaggle.com/datasets/hgvaishnavi/practical-statistics-for-data-science
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vaishnavi Hemadri
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Vaishnavi Hemadri

Released under Apache 2.0

Contents
Data Science Jobs in India.
kaggle.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nagendra Kumar Reddy Syamala (2023). Data Science Jobs in India. [Dataset]. http://doi.org/10.34740/kaggle/dsv/6609558
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6609558
Dataset updated
Oct 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nagendra Kumar Reddy Syamala
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
India
Description
The dataset is very useful and best for the work , related to the classification and other tasks related to the ML Algorithms also can be practiced. About this file 1) Company Name: Various Companies which have offered Data Science related roles are listed in this column

2) Job Titles:

Data Scientist Business Analyst Data Analyst Data Engineer Senior Data Scientist Senior Business Analyst Senior Data Analyst Senior Data Engineer Machine Learning Engineer Data Architect 3) Salaries: Currency of the Salaries are in Rupees. L -> Lakhs. It is the Annual Income.
Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
google data analytics course project
kaggle.com
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sauravchauhan_FE_ENTCA (2024). google data analytics course project [Dataset]. https://www.kaggle.com/datasets/sauravchauhan625003/google-data-analytics-course-project
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sauravchauhan_FE_ENTCA
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Sauravchauhan_FE_ENTCA

Released under MIT

Contents
Customer360Insights
kaggle.com
Updated Jun 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dave Darshan (2024). Customer360Insights [Dataset]. https://www.kaggle.com/datasets/davedarshan/customer360insights
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dave Darshan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Customer360Insights

The Customer360Insights dataset is a synthetic collection meticulously designed to mirror the multifaceted nature of customer interactions within an e-commerce platform. It encompasses a wide array of variables, each serving as a pillar to support various analytical explorations. Here’s a breakdown of the dataset and the potential analyses it enables:

Dataset Description

Customer Demographics: Includes FullName, Gender, Age, CreditScore, and MonthlyIncome. These variables provide a demographic snapshot of the customer base, allowing for segmentation and targeted marketing analysis.

Geographical Data: Comprising Country, State, and City, this section facilitates location-based analytics, market penetration studies, and regional sales performance.

Product Information: Details like Category, Product, Cost, and Price enable product trend analysis, profitability assessment, and inventory optimization.

Transactional Data: Captures the customer journey through SessionStart, CartAdditionTime, OrderConfirmation, OrderConfirmationTime, PaymentMethod, and SessionEnd. This rich temporal data can be used for funnel analysis, conversion rate optimization, and customer behavior modeling.

Post-Purchase Details: With OrderReturn and ReturnReason, analysts can delve into return rate calculations, post-purchase satisfaction, and quality control.

Types of Analysis

Descriptive Analytics: Understand basic metrics like average monthly income, most common product categories, and typical credit scores.

Predictive Analytics: Use machine learning to predict credit risk or the likelihood of a purchase based on demographics and session activity.

Customer Segmentation: Group customers by demographics or purchasing behavior to tailor marketing strategies.

Geospatial Analysis: Examine sales distribution across different regions and optimize logistics. Time Series Analysis: Study the seasonality of purchases and session activities over time.

Funnel Analysis: Evaluate the customer journey from session start to order confirmation and identify drop-off points.

Cohort Analysis: Track customer cohorts over time to understand retention and repeat purchase patterns.

Market Basket Analysis: Discover product affinities and develop cross-selling strategies.

This dataset is a playground for data enthusiasts to practice cleaning, transforming, visualizing, and modeling data. Whether you’re conducting A/B testing for marketing campaigns, forecasting sales, or building customer profiles, Customer360Insights offers a rich, realistic dataset for honing your data science skills.

Curious about how I created the data? Feel free to click here and take a peek! 😉

📊🔍 Good Luck and Happy Analysing 🔍📊
Skills for Data Science
kaggle.com
Updated Mar 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AravindanR (2022). Skills for Data Science [Dataset]. https://www.kaggle.com/datasets/aravindanr22052001/skillscsv/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AravindanR
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This Dataset contains all the essential skills for data science. You can use this data for extracting purposes.

For Example: If you want to find skills in the resume you can use this dataset for better extraction.
Data Science Books Extracted from Amazon
kaggle.com
Updated Apr 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valeria F22 (2023). Data Science Books Extracted from Amazon [Dataset]. http://doi.org/10.34740/kaggle/dsv/5402374
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5402374
Dataset updated
Apr 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Valeria F22
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:

This dataset contains information about data science books that were extracted from Amazon. The dataset includes the book title, author, price, ratings, and number of reviews. This information can be useful for anyone who is interested in data science and wants to explore popular books in the field.

The dataset can be used for various purposes such as analyzing trends in data science book sales, comparing authors and publishers, and identifying highly rated books with a large number of reviews. Additionally, the dataset can be used for training machine learning models to predict book popularity or pricing.

The dataset contains a total of 328 books, with each book having information on its title, author, price, ratings, and number of reviews. The data was scraped from Amazon using web scraping techniques.

Data Dictionary:

Title: The title of the book

Author: The author(s) of the book

Price: The price of the book in US dollars

Ratings: The average rating of the book on Amazon, on a scale of 1-5 stars

Number of Reviews: The number of reviews the book has received on Amazon

I hope that this dataset will be useful for researchers, data scientists, and anyone interested in exploring data science books. Please let us know if you have any questions or feedback.
Data Science, Machine Learning and AI using Python
kaggle.com
zip
Updated Aug 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AMEY THAKUR (2021). Data Science, Machine Learning and AI using Python [Dataset]. https://www.kaggle.com/ameythakur20/data-science-machine-learning-and-ai-using-python
Explore at:
zip(187472 bytes)Available download formats
Dataset updated
Aug 8, 2021
Authors
AMEY THAKUR
Description
Dataset

This dataset was created by AMEY THAKUR

Contents
2018 Kaggle Machine Learning & Data Science Survey
kaggle.com
Updated Apr 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Solyoh21 (2020). 2018 Kaggle Machine Learning & Data Science Survey [Dataset]. https://www.kaggle.com/solyoh21/2018kaggle-machine-learning-data-science-survey/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Solyoh21
License
https://ec.europa.eu/info/legal-notice_enhttps://ec.europa.eu/info/legal-notice_en
Description
Dataset

This dataset was created by Solyoh21

Released under EU ODP Legal Notice

Contents
Data-Science-Book
kaggle.com
Updated Aug 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Waquar Azam (2022). Data-Science-Book [Dataset]. http://doi.org/10.34740/kaggle/dsv/4096198
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/4096198
Dataset updated
Aug 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Md Waquar Azam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context This dataset holds a list of approx 200 + books in the field of Data science related topics. The list of books was constructed using one of the popular websites Amazon which provide information on book ratings and many details given below.

There are 6 column

Book_name / book title

Publisher:-- name of the publisher or writer

Buyers ():--it means no of customer who purchase the same book

Cover_type:-- types of cover use to protect the book

stars:--out of 5 * how much rated

Price

Inspiration I’d like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:

• What is the best-selling book?

• Find any hidden patterns if you can

. EDA of dataset
Kaggle DS Survey 2019
kaggle.com
Updated Dec 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Asri (2019). Kaggle DS Survey 2019 [Dataset]. https://www.kaggle.com/datasets/alanasri/kaggle-ds-survey-2019
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alan Asri
Description
Context

This notebook contains a thorough analysis and explanation related to the survey conducted by Kaggle. The survey was conducted on respondents from work backgrounds, age variations, where they lived, the companies where they worked. Survey questions contain about the world of the field they work in related to Data Scient and Machine Learning.

Content

The following Explanatory Data Analysis is taking data from survey results conducted by Kaggle in 2019 on respondents who give questions about Mechine Learning and Data Scients. Some core points that are in this analysis are as follows, 1. Graph Distribution Age with Formal Education 2. Plot Graph Company and Spent Money in Mechine Learning 3. Comparison spent cost level in Mechine Learning by each company 4. Data Scientist Experience & Their Compensation 5. Correlation between Mechine Learning Experience and Salary benefit 6. Correlation Data Scientist with his Compensation 7. Favourite Media source on Data Scients Topic 8. Favourite media by Age Distribution, Most Likely media by Data Scientist 9. Course Platform for Data Scientist 10. Role Job for each Title, Primary Job of Data Scientist 11. Reguler Programming Languange by Job Title, especially for Data Scientist 12. Comparison Ability spesific programming and Compensation 13. What is the Languange programming learn first aspiring Data Scientist? 14. Integrated Development Environments reguler basis 15. Top 5 IDE and Which Country is using it. Microsoft not dominant in USA 16. What is Notebook as majority likely as a Reguler Basis. Google domination 17. Which Country and What Company use What Hardware for Mechine Learning 18. Role Job based on Spesific Company Type 19. Computer Vision method mostly used by Company 20. Distribution Company by each country 21. Cloud Product, Amazon domination, Goole follow 22. Big Data Product, Amazon majority in Enterprise, Google majority in All

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Ultimate Data Science Book Collection
kaggle.com
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mayuri Awati (2023). Ultimate Data Science Book Collection [Dataset]. https://www.kaggle.com/datasets/mayuriawati/ultimate-data-science-book-collection/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mayuri Awati
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The data set that I have compiled is based on a collection of books related to various topics in data science. I was inspired to create this data set because I wanted to gain insights into the popularity of different data science topics, as well as the most common words used in the titles or descriptions, and the most common authors or publishers in these areas.

To collect the data set, I used the Google Books API, which allowed me to search for and retrieve information about books related to specific topics. I focused on topics such as Python for data science, R, SQL, statistics, machine learning, NLP, deep learning, data visualization, and data ethics, as I wanted to create a diverse and comprehensive data set that covered a wide range of data science subjects.

The books included in the data set were written by various authors and published by different publishing houses, and I included books that were published within the past 10 years. I believe that this data set will be useful for anyone who is interested in data science, whether they are a beginner or an experienced practitioner. It can be used to build recommendation systems for books based on user interests, to identify gaps in the existing literature on a specific topic, or for general data analysis purposes.

I hope that this data set will be a valuable resource for the data science community and will contribute to the advancement of the field.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

zip(153009997518 bytes)Available download formats

Dataset updated

Aug 14, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Data Science Job Market

Dataset Attributes:

2023 Data Scientists Jobs Descriptions

Data Science Glossary For QA

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

US Data Science and Analytics Master's Programs

Student Engagement

Student Engagement

Predicting Engagement and Exam Performance

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Health Care Analytics

Context

Content

Acknowledgements

Inspiration

Practical Statistics for Data Science

Dataset

Contents

Data Science Jobs in India.

Student Performance Data Set

google data analytics course project

Dataset

Contents

Customer360Insights

Customer360Insights

Dataset Description

Types of Analysis

Skills for Data Science

Data Science Books Extracted from Amazon

Data Science, Machine Learning and AI using Python

Dataset

Contents

2018 Kaggle Machine Learning & Data Science Survey

Dataset

Contents

Data-Science-Book

Kaggle DS Survey 2019

Context

Content

Acknowledgements

Inspiration

Ultimate Data Science Book Collection

Meta Kaggle Code

Kaggle's public data on notebook code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments