100+ datasets found

Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
The LargeST Benchmark Dataset
kaggle.com
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
liuxu77 (2023). The LargeST Benchmark Dataset [Dataset]. https://www.kaggle.com/datasets/liuxu77/largest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
liuxu77
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the official website for downloading the CA sub-dataset of the LargeST benchmark dataset. There are a total of 7 files in this page. Among them, 5 files in .h5 format contain the traffic flow raw data from 2017 to 2021, 1 file in .csv format provides the metadata for sensors, and 1 file in .npy format represents the adjacency matrix constructed based on road network distances. Please refer to https://github.com/liuxu77/LargeST for more information.
Fruits-360 dataset
kaggle.com
data.mendeley.com
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
Meta Kaggle Code
kaggle.com
zip
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(151993733346 bytes)Available download formats
Dataset updated
Aug 7, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Predictive Maintenance Dataset
kaggle.com
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
maps dataset
kaggle.com
zip
Updated Jan 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
neerajbhat98 (2020). maps dataset [Dataset]. https://www.kaggle.com/datasets/adlteam/maps-dataset
Explore at:
zip(250762306 bytes)Available download formats
Dataset updated
Jan 29, 2020
Authors
neerajbhat98
Description
Dataset

This dataset was created by neerajbhat98

Contents
BCCD Dataset
kaggle.com
Updated Jan 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
surajmishra (2020). BCCD Dataset [Dataset]. https://www.kaggle.com/datasets/surajiiitm/bccd-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
surajmishra
Description
Dataset

This dataset was created by surajmishra

Contents
Unlabelled dataset
kaggle.com
Updated Oct 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Diggers (2023). Unlabelled dataset [Dataset]. https://www.kaggle.com/datasets/ahmedaliraja/unlabelled-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data Diggers
Description
This dataset consists of unlabeled data representing various data points collected from different sources and domains. The dataset serves as a blank canvas for unsupervised learning experiments, allowing for the exploration of patterns, clusters, and hidden insights through various data analysis techniques. Researchers and data enthusiasts can use this dataset to develop and test unsupervised learning algorithms, identify underlying structures, and gain a deeper understanding of data without predefined labels.
NSL-KDD
kaggle.com
Updated Mar 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kiran (2020). NSL-KDD [Dataset]. https://www.kaggle.com/datasets/kiranmahesh/nslkdd
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 16, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kiran
Description
Dataset

This dataset was created by Kiran

Contents
LLM: 7 prompt training dataset
kaggle.com
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl McBride Ellis (2023). LLM: 7 prompt training dataset [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/llm-7-prompt-training-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl McBride Ellis
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
Version 4: Adding the data from "LLM-generated essay using PaLM from Google Gen-AI" kindly generated by Kingki19 / Muhammad Rizqi.
File: train_essays_RDizzl3_seven_v2.csv
Human texts: 14247 LLM texts: 3004

See also: a new dataset of an additional 4900 LLM generated texts: LLM: Mistral-7B Instruct texts

Version 3: "**The RDizzl3 Seven**"
File: train_essays_RDizzl3_seven_v1.csv

"Car-free cities"

"Does the electoral college work?"

"Exploring Venus"

"The Face on Mars"

"Facial action coding system"

"A Cowboy Who Rode the Waves"

"Driverless cars"

How this dataset was made: see the notebook "LLM: Make 7 prompt train dataset"

Version 2: (train_essays_7_prompts_v2.csv) This dataset is composed of 13,712 human texts and 1638 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.

Namely:

"Car-free cities"

"Does the electoral college work?"

"Exploring Venus"

"The Face on Mars"

"Facial action coding system"

"Seeking multiple opinions"

"Phones and driving"

This dataset is a derivative of the datasets

LLM Generated Essays for the Detect AI Comp! by Radek Osmulski

persuade corpus 2.0 provided by Nicholas Broad

daigt data - llama 70b and falcon180b by Nicholas Broad

Hello, Claude! 1000 essays from Anthropic... by Darragh

as well as the original competition training dataset

Version 1:This dataset is composed of 13,712 human texts and 1165 AI-LLM generated texts originating from 7 of the PERSUADE 2.0 corpus prompts.
Lifesnaps Fitbit dataset
kaggle.com
Updated Feb 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Skylar Carroll (2023). Lifesnaps Fitbit dataset [Dataset]. https://www.kaggle.com/datasets/skywescar/lifesnaps-fitbit-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Skylar Carroll
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Taken verbatim from the source: Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.
Iris dataset
kaggle.com
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Nakrani (2022). Iris dataset [Dataset]. https://www.kaggle.com/datasets/himanshunakrani/iris-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Nakrani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
It includes three iris species with 50 samples each as well as some properties of each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

FIle name: iris.csv
Legal Text Classification Dataset
kaggle.com
Updated Oct 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A.Mohan kumar (2023). Legal Text Classification Dataset [Dataset]. https://www.kaggle.com/datasets/amohankumar/legal-text-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
A.Mohan kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset contains a total of 25000 legal cases in the form of text documents. Each document has been annotated with catchphrases, citations sentences, citation catchphrases, and citation classes. Citation classes indicate the type of treatment given to the cases cited by the present case.
🫀 Heart Disease Dataset
kaggle.com
Updated Apr 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2024). 🫀 Heart Disease Dataset [Dataset]. https://www.kaggle.com/datasets/mexwell/heart-disease-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mexwell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This heart disease dataset is curated by combining 5 popular heart disease datasets already available independently but not combined before. In this dataset, 5 heart datasets are combined over 11 common features which makes it the largest heart disease dataset available so far for research purposes. The five datasets used for its curation are:

Cleveland

Hungarian

Switzerland

Long Beach VA

Statlog (Heart) Data Set.

This dataset consists of 1190 instances with 11 features. These datasets were collected and combined at one place to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment.

Acknowlegement

Foto von Kenny Eliason auf Unsplash
Face-Detection-Dataset
kaggle.com
gts.ai
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fares Elmenshawii (2023). Face-Detection-Dataset [Dataset]. https://www.kaggle.com/datasets/fareselmenshawii/face-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 10, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Fares Elmenshawii
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset comprises 16.7k images and 2 annotation files, each in a distinct format. The first file, labeled "Label," contains annotations with the original scale, while the second file, named "yolo_format_labels," contains annotations in YOLO format. The dataset was obtained by employing the OIDv4 toolkit, specifically designed for scraping data from Google Open Images. Notably, this dataset exclusively focuses on face detection.

This dataset offers a highly suitable resource for training deep learning models specifically designed for face detection tasks. The images within the dataset exhibit exceptional quality and have been meticulously annotated with bounding boxes encompassing the facial regions. The annotations are provided in two formats: the original scale, denoting the pixel coordinates of the bounding boxes, and the YOLO format, representing the bounding box coordinates in normalized form.

The dataset was meticulously curated by scraping relevant images from Google Open Images through the use of the OIDv4 toolkit. Only images that are pertinent to face detection tasks have been included in this dataset. Consequently, it serves as an ideal choice for training deep learning models that specifically target face detection tasks.
REFUGE2
kaggle.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Lemos (2023). REFUGE2 [Dataset]. https://www.kaggle.com/datasets/victorlemosml/refuge2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Victor Lemos
Description
The goal of REFUGE2 challenge is to evaluate and compare automated algorithms for glaucoma detection and optic disc/cup segmentation on a standard dataset of retinal fundus images. We invite the medical image analysis community to participate by developing and testing existing and novel automated classification and segmentation methods.

REFUGE2 challenge consists of THREE Tasks: Classification of clinical Glaucoma Segmentation of Optic Disc and Cup Localization of Fovea (macular center)
Apple Quality
kaggle.com
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2024). Apple Quality [Dataset]. http://doi.org/10.34740/kaggle/dsv/7384155
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7384155
Dataset updated
Jan 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nidula Elgiriyewithana ⚡
Description
Description:

This dataset contains information about various attributes of a set of fruits, providing insights into their characteristics. The dataset includes details such as fruit ID, size, weight, sweetness, crunchiness, juiciness, ripeness, acidity, and quality.

Key Features:

A_id: Unique identifier for each fruit

Size: Size of the fruit

Weight: Weight of the fruit

Sweetness: Degree of sweetness of the fruit

Crunchiness: Texture indicating the crunchiness of the fruit

Juiciness: Level of juiciness of the fruit

Ripeness: Stage of ripeness of the fruit

Acidity: Acidity level of the fruit

Quality: Overall quality of the fruit

Potential Use Cases:

Fruit Classification: Develop a classification model to categorize fruits based on their features.

Quality Prediction: Build a model to predict the quality rating of fruits using various attributes.

The dataset was generously provided by an American agriculture company. The data has been scaled and cleaned for ease of use.

If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂 Thank you
Mongo DB/ Json datasets
kaggle.com
Updated Sep 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shrashti (2023). Mongo DB/ Json datasets [Dataset]. https://www.kaggle.com/datasets/shrashtisinghal/mongo-db-datsets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shrashti
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Introducing the largest and most comprehensive collection of Mongo DB Dataset! This meticulously curated dataset brings together a wealth of information from various domains, including ecommerce, aviation, biology, zoology, literature, history, and more. Meticulously gathered from numerous reliable sources, this dataset has been expertly transformed into a unified format, making it an invaluable resource for researchers, data scientists, and enthusiasts alike. Each domain contributes its unique insights and knowledge, providing a diverse range of information for exploration and analysis. With its enriched content and extensive coverage, this Mongo DB Dataset opens up endless possibilities for uncovering hidden patterns, conducting groundbreaking research, and gaining profound insights across multiple disciplines.
Sentinel-1&2 Image Pairs (SAR & Optical)
kaggle.com
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paritosh Tiwari (2021). Sentinel-1&2 Image Pairs (SAR & Optical) [Dataset]. https://www.kaggle.com/datasets/requiemonk/sentinel12-image-pairs-segregated-by-terrain
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Paritosh Tiwari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset consists of SAR and Optical (RGB) image pairs from Sentinel‑1 and Sentinel‑2 satellites, provided by the Technical University of Munich. Sentinel-1&2 Image Pairs, Michael Schmitt, Technical University of Munich (TUM)

We searched through images captured during the fall season in the original dataset provided by TUM, and selected images which could belong to each of the four classes: barren land, grassland, agricultural land, and urban areas. Optical images shown in the following sections give an idea of the type of images belonging to each class. We have tried to introduced as much variation as possible when selecting images for a class.

Data can be used to train a Conditional GAN. Since the images in this dataset are highly complex i.e. they are not regularized and they do not have a neat geometric pattern or orientation, it can also be used to check the robustness of a model, no matter the task.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set

Student Performance Data Set

Student achievement in secondary education of two Portuguese schools.

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 27, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Data-Science Sean

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

Clear search

Close search

Google apps

Main menu

Student Performance Data Set

The LargeST Benchmark Dataset

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

Online Sales Dataset - Popular Marketplace Data

Columns:

Insights:

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Predictive Maintenance Dataset

maps dataset

Dataset

Contents

BCCD Dataset

Dataset

Contents

Unlabelled dataset

NSL-KDD

Dataset

Contents

LLM: 7 prompt training dataset

Lifesnaps Fitbit dataset

Iris dataset

Legal Text Classification Dataset

🫀 Heart Disease Dataset

Acknowlegement

Face-Detection-Dataset

REFUGE2

Apple Quality

Description:

Key Features:

Potential Use Cases:

Mongo DB/ Json datasets

Sentinel-1&2 Image Pairs (SAR & Optical)

Student Performance Data SetSee More Versions

Student achievement in secondary education of two Portuguese schools.

Student Performance Data Set