100+ datasets found

The LargeST Benchmark Dataset
kaggle.com
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
liuxu77 (2023). The LargeST Benchmark Dataset [Dataset]. https://www.kaggle.com/datasets/liuxu77/largest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
liuxu77
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the official website for downloading the CA sub-dataset of the LargeST benchmark dataset. There are a total of 7 files in this page. Among them, 5 files in .h5 format contain the traffic flow raw data from 2017 to 2021, 1 file in .csv format provides the metadata for sensors, and 1 file in .npy format represents the adjacency matrix constructed based on road network distances. Please refer to https://github.com/liuxu77/LargeST for more information.
Fake News Prediction Dataset
kaggle.com
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Kumar (2023). Fake News Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/rajatkumar30/fake-news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rajat Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
** Please Upvote if you like the dataset **

Fake news or hoax news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.

This dataset is having Both Fake and Real news.

The columns present in the dataset are:-

1) Title -> Title of the News

2) Text -> Text or Content of the News

3) Label -> Labelling the news as Fake or Real
e
kaggle.com Traffic Analytics Data
analytics.explodingtopics.com
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). kaggle.com Traffic Analytics Data [Dataset]. https://analytics.explodingtopics.com/website/kaggle.com
Explore at:
Dataset updated
Jun 1, 2025
Variables measured
Global Rank, Monthly Visits, Authority Score, US Country Rank, Computer Software & Development Category Rank
Description
Traffic analytics, rankings, and competitive metrics for kaggle.com as of June 2025
Book-Crossing Dataset
kaggle.com
zip
Updated Sep 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
somnambWl (2019). Book-Crossing Dataset [Dataset]. https://www.kaggle.com/datasets/somnambwl/bookcrossing-dataset
Explore at:
zip(17632108 bytes)Available download formats
Dataset updated
Sep 7, 2019
Authors
somnambWl
Description
Book-Crossing dataset mined by Cai-Nicolas Ziegler

Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):

PDF

Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.

Further information and the original dataset can be found at the original webpage.

Changes to the dataset:

Location removed as it comes in different formats not in default (city, state, country).

Transferred from ISO-8859-1 to UTF-8

Manually fixed a few rows with incorrect number of columns

Note:

out of 278859 users:

only 99053 rated at least 1 book

only 43385 rated at least 2 books.

only 12306 rated at least 10 books.

out of 271379 books:

only 270171 are rated at least once.

only 124513 have at least 2 ratings.

only 17480 have at least 10 ratings.
R
Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection...
universe.roboflow.com
zip
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ChawawiwatPractice (2024). Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset Dataset [Dataset]. https://universe.roboflow.com/chawawiwatpractice/humans-from-https-www.kaggle.com-datasets-constantinwerner-human-detection-dataset-cewfm
Explore at:
zipAvailable download formats
Dataset updated
Jun 20, 2024
Dataset authored and provided by
ChawawiwatPractice
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Human Bounding Boxes
Description
Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset

## Overview Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset is a dataset for object detection tasks - it contains Human annotations for 548 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
Apple Quality
kaggle.com
Updated Jan 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nidula Elgiriyewithana ⚡ (2024). Apple Quality [Dataset]. http://doi.org/10.34740/kaggle/dsv/7384155
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7384155
Dataset updated
Jan 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nidula Elgiriyewithana ⚡
Description
Description:

This dataset contains information about various attributes of a set of fruits, providing insights into their characteristics. The dataset includes details such as fruit ID, size, weight, sweetness, crunchiness, juiciness, ripeness, acidity, and quality.

Key Features:

A_id: Unique identifier for each fruit

Size: Size of the fruit

Weight: Weight of the fruit

Sweetness: Degree of sweetness of the fruit

Crunchiness: Texture indicating the crunchiness of the fruit

Juiciness: Level of juiciness of the fruit

Ripeness: Stage of ripeness of the fruit

Acidity: Acidity level of the fruit

Quality: Overall quality of the fruit

Potential Use Cases:

Fruit Classification: Develop a classification model to categorize fruits based on their features.

Quality Prediction: Build a model to predict the quality rating of fruits using various attributes.

The dataset was generously provided by an American agriculture company. The data has been scaled and cleaned for ease of use.

If you find this dataset useful, your support through an upvote would be greatly appreciated ❤️🙂 Thank you

Loan Approval Classification Dataset

kaggle.com

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ta-wei Lo (2024). Loan Approval Classification Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/loan-approval-classification-data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Oct 29, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ta-wei Lo

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

1. Data Source

This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

2. Metadata

The dataset contains 45,000 records and 14 variables, each described below:

Column	Description	Type
`person_age`	Age of the person	Float
`person_gender`	Gender of the person	Categorical
`person_education`	Highest education level	Categorical
`person_income`	Annual income	Float
`person_emp_exp`	Years of employment experience	Integer
`person_home_ownership`	Home ownership status (e.g., rent, own, mortgage)	Categorical
`loan_amnt`	Loan amount requested	Float
`loan_intent`	Purpose of the loan	Categorical
`loan_int_rate`	Loan interest rate	Float
`loan_percent_income`	Loan amount as a percentage of annual income	Float
`cb_person_cred_hist_length`	Length of credit history in years	Float
`credit_score`	Credit score of the person	Integer
`previous_loan_defaults_on_file`	Indicator of previous loan defaults	Categorical
`loan_status` (target variable)	Loan approval status: 1 = approved; 0 = rejected	Integer

3. Data Usage

The dataset can be used for multiple purposes:

Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

Mind the data issue from the original data, such as the instance > 100-year-old as age.

This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

change_detection.pytorch-main
kaggle.com
Updated May 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
a1344683084 (2022). change_detection.pytorch-main [Dataset]. https://www.kaggle.com/datasets/a1344683084/change-detectionpytorchmain
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
a1344683084
Description
Dataset

This dataset was created by a1344683084

Contents
Legal Text Classification Dataset
kaggle.com
Updated Oct 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A.Mohan kumar (2023). Legal Text Classification Dataset [Dataset]. https://www.kaggle.com/datasets/amohankumar/legal-text-classification-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 17, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
A.Mohan kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset contains a total of 25000 legal cases in the form of text documents. Each document has been annotated with catchphrases, citations sentences, citation catchphrases, and citation classes. Citation classes indicate the type of treatment given to the cases cited by the present case.
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Predicting Heart Failure
kaggle.com
Updated Sep 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Chauhan (2022). Predicting Heart Failure [Dataset]. https://www.kaggle.com/datasets/whenamancodes/heart-failure-clinical-records
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aman Chauhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

Attribute Information:

Thirteen (13) clinical features: - age: age of the patient (years) - anaemia: decrease of red blood cells or hemoglobin (boolean) - high blood pressure: if the patient has hypertension (boolean) - creatinine phosphokinase (CPK): level of the CPK enzyme in the blood (mcg/L) - diabetes: if the patient has diabetes (boolean) - ejection fraction: percentage of blood leaving the heart at each contraction (percentage) - platelets: platelets in the blood (kiloplatelets/mL) - sex: woman or man (binary) - serum creatinine: level of serum creatinine in the blood (mg/dL) - serum sodium: level of serum sodium in the blood (mEq/L) - smoking: if the patient smokes or not (boolean) - time: follow-up period (days) - [target] death event: if the patient deceased during the follow-up period (boolean)

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha
May 2015 Reddit Comments
kaggle.com
zip
Updated Jun 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2019). May 2015 Reddit Comments [Dataset]. https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015
Explore at:
zip(21429083286 bytes)Available download formats
Dataset updated
Jun 4, 2019
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
Description
Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we've decided to host a small portion of the comments here for Kagglers to explore. (You don't even need to leave your browser!)

You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?

Who knows? Top visualizations may just end up on Reddit.

Data Description

The database has one table, May2015, with the following fields:

created_utc

ups

subreddit_id

link_id

name

score_hidden

author_flair_css_class

author_flair_text

subreddit

id

removal_reason

gilded

downs

archived

author

score

retrieved_on

body

distinguished

edited

controversiality

parent_id
DFL Bundesliga 460 MP4 Videos in 30Sec. + CSV
kaggle.com
Updated Aug 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saber (2022). DFL Bundesliga 460 MP4 Videos in 30Sec. + CSV [Dataset]. https://www.kaggle.com/datasets/saberghaderi/-dfl-bundesliga-460-mp4-videos-in-30sec-csv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Saber
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
From a young age, hopeful talents devote time, money, and training to the sport. Yet, while the next superstar is guaranteed to start off in youth or semi-professional leagues, these leagues often have the fewest resources to invest. This includes resources for the collection of event data which helps generate insights into the performance of the teams and players.

****About Dataset:**** This dataset with 460 training and test videos in 2 folders was collected by dataset of competition videos. All videos are in MP4 format.

** Please note that the number of videos in each folder is different

Version 1 --> 460 MP4 file in 2 Folder + .CSV file Version 2 --> Coming Soon!

competition page: https://www.kaggle.com/competitions/dfl-bundesliga-data-shootout

wish you all the best
sql injection dataset
kaggle.com
Updated Sep 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syed Saqlain Hussain Shah (2021). sql injection dataset [Dataset]. https://www.kaggle.com/datasets/syedsaqlainhussain/sql-injection-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Syed Saqlain Hussain Shah
Description
Context

I have collected raw data of SQL injection attacks and benign traffic from different websites and cleaned that data. Thanks to @sajid576 and @mehjabeenshachi who have also contributed their time and effort.
Customer Segmentation Data
kaggle.com
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raval Smit (2024). Customer Segmentation Data [Dataset]. https://www.kaggle.com/datasets/ravalsmit/customer-segmentation-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 11, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Raval Smit
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.

Key Features:

Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.

Usage Examples:

Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
Human Activity Recognition (HAR - Video Dataset)
kaggle.com
Updated May 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sharjeel M. (2023). Human Activity Recognition (HAR - Video Dataset) [Dataset]. http://doi.org/10.34740/kaggle/dsv/5722068
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5722068
Dataset updated
May 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sharjeel M.
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.

Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.

The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.
Financial_Risk
kaggle.com
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Preetham Gouda (2024). Financial_Risk [Dataset]. https://www.kaggle.com/datasets/preethamgouda/financial-risk
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2024
Dataset provided by
Kaggle
Authors
Preetham Gouda
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Financial Risk Assessment Dataset provides detailed information on individual financial profiles. It includes demographic, financial, and behavioral data to assess financial risk. The dataset features various columns such as income, credit score, and risk rating, with intentional imbalances and missing values to simulate real-world scenarios.
Walmart Dataset
kaggle.com
Updated Dec 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Walmart Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/walmart-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">

Description:

One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.

Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.

Acknowledgements

The dataset is taken from Kaggle.

Objective:

Understand the Dataset & cleanup (if required).

Build Regression models to predict the sales w.r.t single & multiple features.

Also evaluate the models & compare their respective scores like R2, RMSE, etc.
PCOS detection using ultrasound images
kaggle.com
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anagha Choudhari (2022). PCOS detection using ultrasound images [Dataset]. https://www.kaggle.com/datasets/anaghachoudhari/pcos-detection-using-ultrasound-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anagha Choudhari
Description
Data folder consist of 'train' and 'test' subfolders containing 2 categories of data 'infected' and 'notinfected' infected : Images of ovaries having PCOS notinfected : Images of healthy ovaries

Facebook

Twitter

Click to copy link

Link copied

Cite

liuxu77 (2023). The LargeST Benchmark Dataset [Dataset]. https://www.kaggle.com/datasets/liuxu77/largest

The LargeST Benchmark Dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 13, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

liuxu77

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

This is the official website for downloading the CA sub-dataset of the LargeST benchmark dataset. There are a total of 7 files in this page. Among them, 5 files in .h5 format contain the traffic flow raw data from 2017 to 2021, 1 file in .csv format provides the metadata for sensors, and 1 file in .npy format represents the adjacency matrix constructed based on road network distances. Please refer to https://github.com/liuxu77/LargeST for more information.

Clear search

Close search

Google apps

Main menu

The LargeST Benchmark Dataset

Fake News Prediction Dataset

kaggle.com Traffic Analytics Data

Book-Crossing Dataset

Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection...

Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset

Online Sales Dataset - Popular Marketplace Data

Columns:

Insights:

Apple Quality

Description:

Key Features:

Potential Use Cases:

Loan Approval Classification Dataset

1. Data Source

2. Metadata

3. Data Usage

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

change_detection.pytorch-main

Dataset

Contents

Legal Text Classification Dataset

Network Traffic Dataset

Predicting Heart Failure

Attribute Information:

May 2015 Reddit Comments

Data Description

DFL Bundesliga 460 MP4 Videos in 30Sec. + CSV

sql injection dataset

Context

Customer Segmentation Data

Key Features:

Usage Examples:

Human Activity Recognition (HAR - Video Dataset)

Financial_Risk

Walmart Dataset

Description:

Acknowledgements

Objective:

PCOS detection using ultrasound images

The LargeST Benchmark Dataset