100+ datasets found
  1. Top 2500 Kaggle Datasets

    • kaggle.com
    Updated Feb 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saket Kumar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

    Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

    Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

    Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

    Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

    Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

    Column Definitions:

    Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

  2. Student Performance Data Set

    • kaggle.com
    Updated Mar 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  3. Retail Analysis on Large Dataset

    • kaggle.com
    Updated Jun 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Prajapati (2024). Retail Analysis on Large Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/8693643
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Prajapati
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Description:

    • The dataset represents retail transactional data. It contains information about customers, their purchases, products, and transaction details. The data includes various attributes such as customer ID, name, email, phone, address, city, state, zipcode, country, age, gender, income, customer segment, last purchase date, total purchases, amount spent, product category, product brand, product type, feedback, shipping method, payment method, and order status.

    Key Points:

    Customer Information:

    • Includes customer details like ID, name, email, phone, address, city, state, zipcode, country, age, and gender. Customer segments are categorized into Premium, Regular, and New. ##Transaction Details:
    • Transaction-specific data such as transaction ID, last purchase date, total purchases, amount spent, total purchase amount, feedback, shipping method, payment method, and order status. ##Product Information:
    • Contains product-related details such as product category, brand, and type. Products are categorized into electronics, clothing, grocery, books, and home decor. ##Geographic Information:
    • Contains location details including city, state, and country. Available for various countries including USA, UK, Canada, Australia, and Germany. ##Temporal Information:
    • Last purchase date is provided along with separate columns for year, month, date, and time. Allows analysis based on temporal patterns and trends. ##Data Quality:
    • Some rows contain null values, and others are duplicates, which may need to be handled during data preprocessing. Null values are randomly distributed across rows. Duplicate rows are available at different parts of the dataset. ##Potential Analysis:
    • Customer segmentation analysis based on demographics, purchase behavior, and feedback. Sales trend analysis over time to identify peak seasons or trends. Product performance analysis to determine popular categories, brands, or types. Geographic analysis to understand regional preferences and trends. Payment and shipping method analysis to optimize services. Customer satisfaction analysis based on feedback and order status. ##Data Preprocessing:
    • Handling null values and duplicates. Parsing and formatting temporal data. Encoding categorical variables. Scaling numerical variables if required. Splitting data into training and testing sets for modeling.
  4. Go To College Dataset

    • kaggle.com
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saddam Sinatrya Jalu Mukti (2022). Go To College Dataset [Dataset]. https://www.kaggle.com/datasets/saddamazyazy/go-to-college-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saddam Sinatrya Jalu Mukti
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a synthetic data created for a college project. This data aims to predict whether students will continue to go to college or not. With machine learning explainability, school counselors can help students that will not go to college by finding the factor and helping them. Lets build something really helpful. Here is my recommendation notebook.

    PS: Like I said before, this is synthetic data. If you have a resource to get real data, your contribution is welcome. Thank you.

  5. Policy Dataset

    • kaggle.com
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sjagkoo7 (2023). Policy Dataset [Dataset]. https://www.kaggle.com/datasets/sjagkoo7/policy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sjagkoo7
    Description

    Design a prediction model if a customer having income more than 50000 dollar then need to advise for ploicy. This prediction will help team to take decisions for providing the financial assistance for low income group customers.

  6. 60k-data-with-context-v2

    • kaggle.com
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Deotte (2023). 60k-data-with-context-v2 [Dataset]. https://www.kaggle.com/datasets/cdeotte/60k-data-with-context-v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Chris Deotte
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset can be used to train an Open Book model for Kaggle's LLM Science Exam competition. This dataset was generated by searching and concatenating all publicly shared datasets on Sept 1 2023.

    The context column was generated using Mgoksu's notebook here with NUM_TITLES=5 and NUM_SENTENCES=20

    The source column indicates where the dataset originated. Below are the sources:

    source = 1 & 2 * Radek's 6.5k dataset. Discussion here annd here, dataset here.

    source = 3 & 4 * Radek's 15k + 5.9k. Discussion here and here, dataset here

    source = 5 & 6 * Radek's 6k + 6k. Discussion here and here, dataset here

    source = 7 * Leonid's 1k. Discussion here, dataset here

    source = 8 * Gigkpeaeums 3k. Discussion here, dataset here

    source = 9 * Anil 3.4k. Discussion here, dataset here

    source = 10, 11, 12 * Mgoksu 13k. Discussion here, dataset here

  7. Data from: Text to SQL dataset

    • kaggle.com
    Updated Jul 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammad Nour Alawad (2024). Text to SQL dataset [Dataset]. https://www.kaggle.com/datasets/mohammadnouralawad/spider-text-sql
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohammad Nour Alawad
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset consists of 8,034 entries designed to evaluate the performance of text-to-SQL models. Each entry contains a natural language text query and its corresponding SQL command. The dataset is a subset derived from the Spider dataset, focusing on diverse and complex queries to challenge the understanding and generation capabilities of machine learning models.

  8. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Oct 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(160334510007 bytes)Available download formats
    Dataset updated
    Oct 9, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  9. 1000_companies_profit

    • kaggle.com
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupak Roy/ Bob (2022). 1000_companies_profit [Dataset]. https://www.kaggle.com/datasets/rupakroy/1000-companies-profit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rupak Roy/ Bob
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset includes sample data of 1000 startup companies operating cost and their profit. Well-formatted dataset for building ML regression pipelines. Includes R&D Spend float64 Administration float64 Marketing Spend float64 State object Profit float64

  10. Book-Crossing Dataset

    • kaggle.com
    zip
    Updated Sep 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    somnambWl (2019). Book-Crossing Dataset [Dataset]. https://www.kaggle.com/datasets/somnambwl/bookcrossing-dataset
    Explore at:
    zip(17632108 bytes)Available download formats
    Dataset updated
    Sep 7, 2019
    Authors
    somnambWl
    Description

    Book-Crossing dataset mined by Cai-Nicolas Ziegler

    Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):

    • PDF

    • Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.

    Further information and the original dataset can be found at the original webpage.

    Changes to the dataset:

    • Location removed as it comes in different formats not in default (city, state, country).
    • Transferred from ISO-8859-1 to UTF-8
    • Manually fixed a few rows with incorrect number of columns

    Note:

    • out of 278859 users:
      • only 99053 rated at least 1 book
      • only 43385 rated at least 2 books.
      • only 12306 rated at least 10 books.
    • out of 271379 books:
      • only 270171 are rated at least once.
      • only 124513 have at least 2 ratings.
      • only 17480 have at least 10 ratings.
  11. Le2i Fall Dataset

    • kaggle.com
    Updated Apr 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tuyenldvn (2023). Le2i Fall Dataset [Dataset]. https://www.kaggle.com/datasets/tuyenldvn/falldataset-imvia
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    tuyenldvn
    Description

    Dataset

    This dataset was created by tuyenldvn

    Contents

  12. BBC Full Text Document Classification

    • kaggle.com
    Updated Jan 26, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Kushwaha (2019). BBC Full Text Document Classification [Dataset]. https://www.kaggle.com/datasets/shivamkushwaha/bbc-full-text-document-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shivam Kushwaha
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Dataset

    This dataset was created by Shivam Kushwaha

    Released under Database: Open Database, Contents: Database Contents

    Contents

  13. ❤️‍🩹 Medical Condition Prediction Dataset

    • kaggle.com
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciobanu Marius (2024). ❤️‍🩹 Medical Condition Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/medical-condition-prediction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ciobanu Marius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About Dataset

    This dataset provides information about various medical conditions such as Cancer, Pneumonia, and Diabetic based on demographic, lifestyle, and health-related features. It contains randomly generated user data, including multiple missing values, making it suitable for handling imbalanced classification tasks and missing data problems.

    Features

    • id: Unique identifier for each user.
    • full_name: Randomly generated user name.
    • age: Age of the user (ranging from 18 to 90 years), with some missing values.
    • gender: The gender of the user (categorized as Male, Female, or Non-Binary).
    • smoking_status: Indicates the smoking status of the user (Smoker, Non-Smoker, Former-Smoker).
    • bmi: Body Mass Index (BMI) of the user (ranging from 15 to 40), with some missing values.
    • blood_pressure: Blood pressure levels of the user (ranging from 90 to 180 mmHg), with some missing values.
    • glucose_levels: Blood glucose levels of the user (ranging from 70 to 200 mg/dL), with some missing values.
    • condition: The target label indicating the medical condition of the user (Cancer, Pneumonia, or Diabetic), with imbalanced distribution (15% Cancer, 25% Pneumonia, 60% Diabetic).

    Goal

    The objective of this dataset is to predict the medical condition (Cancer, Pneumonia, Diabetic) of a user based on their demographic, lifestyle, and health-related features. This dataset can be used to explore strategies for dealing with imbalanced classes and missing data in healthcare applications. ​

  14. Dataset of pdf files

    • kaggle.com
    Updated May 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manisha717 (2024). Dataset of pdf files [Dataset]. https://www.kaggle.com/datasets/manisha717/dataset-of-pdf-files
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Manisha717
    Description

    The dataset consists of diverse PDF files covering a wide range of topics. These files include reports, articles, manuals, and more, spanning various fields such as science, technology, history, literature, and business. With its broad content, the dataset offers versatility for testing and various purposes, making it valuable for researchers, developers, educators, and enthusiasts alike.

  15. Augmented Alzheimer MRI Dataset

    • kaggle.com
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    uraninjo (2022). Augmented Alzheimer MRI Dataset [Dataset]. https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    uraninjo
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The data consists of MRI images. The data has four classes of images both in training as well as a testing set:

    1. Mild Demented
    2. Moderate Demented
    3. Non Demented
    4. Very Mild Demented

    The data contains two folders. One of them is augmented ones and the other one is originals. Originals could be used for validation or test dataset...

    Data is augmented from an existing dataset. Original images can be seen in Data Explorer. https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-of-images

    My purpose of the publish this dataset is to the usage of augmented images as well as originals. The importance of augmentation is can be a little underrated.

  16. ImageNet-1k-1

    • kaggle.com
    Updated Apr 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sautkin (2023). ImageNet-1k-1 [Dataset]. https://www.kaggle.com/datasets/sautkin/imagenet1k1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sautkin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset containing 500-999 classes of ImageNet Is part of the Imagenet dataset, all parts are: ImageNet-1k-0 - https://www.kaggle.com/datasets/sautkin/imagenet1k0 (0-499 classes); ImageNet-1k-1 - this; ImageNet-1k-2 - https://www.kaggle.com/datasets/sautkin/imagenet1k2 (0-499 classes); ImageNet-1k-3 - https://www.kaggle.com/datasets/sautkin/imagenet1k3 (500-999 classes); ImageNet-1k-valid - https://www.kaggle.com/datasets/sautkin/imagenet1kvalid (0-999 classes, test part)

  17. demo xml

    • kaggle.com
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amritanshu Sharma (2022). demo xml [Dataset]. https://www.kaggle.com/datasets/amritanshusharma23/demo-xml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amritanshu Sharma
    Description

    Dataset

    This dataset was created by Amritanshu Sharma

    Released under Data files © Original Authors

    Contents

  18. Gestational Diabetes

    • kaggle.com
    • ieee-dataport.org
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasool Jader (2024). Gestational Diabetes [Dataset]. http://doi.org/10.34740/kaggle/dsv/8301853
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rasool Jader
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Gestational diabetes is a type of high blood sugar that develops during pregnancy. It can occur at any stage of pregnancy and cause problems for both the mother and the baby, during and after birth. The risks can be reduced if they are early detected and managed, especially in areas where only periodic tests of pregnant women are available. Intelligent systems designed by machine learning algorithms are remodelling all fields of our lives, including the healthcare system. This study proposes a combined prediction model to diagnose gestational diabetes. The dataset was obtained from the Kurdistan region laboratories, which collected information from pregnant women with and without diabetes.

  19. MRL Eye Dataset

    • kaggle.com
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imad Eddine Djerarda (2023). MRL Eye Dataset [Dataset]. https://www.kaggle.com/datasets/imadeddinedjerarda/mrl-eye-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Imad Eddine Djerarda
    Description

    Dataset

    This dataset was created by Imad Eddine Djerarda

    Contents

  20. Climate Change: Earth Surface Temperature Data

    • kaggle.com
    • redivis.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
    Explore at:
    zip(88843537 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset authored and provided by
    Berkeley Earthhttp://berkeleyearth.org/
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Earth
    Description

    Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

    us-climate-change

    Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

    Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

    We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

    In this dataset, we have include several files:

    Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

    • Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures
    • LandAverageTemperature: global average land temperature in celsius
    • LandAverageTemperatureUncertainty: the 95% confidence interval around the average
    • LandMaxTemperature: global average maximum land temperature in celsius
    • LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature
    • LandMinTemperature: global average minimum land temperature in celsius
    • LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature
    • LandAndOceanAverageTemperature: global average land and ocean temperature in celsius
    • LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

    Other files include:

    • Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)
    • Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)
    • Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)
    • Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

    The raw data comes from the Berkeley Earth data page.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Saket Kumar (2024). Top 2500 Kaggle Datasets [Dataset]. http://doi.org/10.34740/kaggle/dsv/7637365
Organization logo

Top 2500 Kaggle Datasets

Explore, Analyze, Innovate: The Best of Kaggle's Data at Your Fingertips

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saket Kumar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This dataset compiles the top 2500 datasets from Kaggle, encompassing a diverse range of topics and contributors. It provides insights into dataset creation, usability, popularity, and more, offering valuable information for researchers, analysts, and data enthusiasts.

Research Analysis: Researchers can utilize this dataset to analyze trends in dataset creation, popularity, and usability scores across various categories.

Contributor Insights: Kaggle contributors can explore the dataset to gain insights into factors influencing the success and engagement of their datasets, aiding in optimizing future submissions.

Machine Learning Training: Data scientists and machine learning enthusiasts can use this dataset to train models for predicting dataset popularity or usability based on features such as creator, category, and file types.

Market Analysis: Analysts can leverage the dataset to conduct market analysis, identifying emerging trends and popular topics within the data science community on Kaggle.

Educational Purposes: Educators and students can use this dataset to teach and learn about data analysis, visualization, and interpretation within the context of real-world datasets and community-driven platforms like Kaggle.

Column Definitions:

Dataset Name: Name of the dataset. Created By: Creator(s) of the dataset. Last Updated in number of days: Time elapsed since last update. Usability Score: Score indicating the ease of use. Number of File: Quantity of files included. Type of file: Format of files (e.g., CSV, JSON). Size: Size of the dataset. Total Votes: Number of votes received. Category: Categorization of the dataset's subject matter.

Search
Clear search
Close search
Google apps
Main menu