100+ datasets found

password and username generator
kaggle.com
Updated Apr 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean_oliveirasi (2023). password and username generator [Dataset]. https://www.kaggle.com/datasets/jeanoliveirasi/password-and-username-generator/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jean_oliveirasi
Description
Dataset

This dataset was created by Jean_oliveirasi

Contents
Kaggle Bot Account Detection
kaggle.com
Updated Feb 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shriyash Jagtap (2023). Kaggle Bot Account Detection [Dataset]. https://www.kaggle.com/datasets/shriyashjagtap/kaggle-bot-account-detection/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shriyash Jagtap
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The data in question was generated using the Faker library and is not authentic real-world data. In recent years, there have been numerous reports suggesting the presence of bot voting practices that have resulted in manipulated outcomes within data science competitions. As a result of this, the idea for creating a simulated dataset arose. Although this is the first time that this dataset has been created, it is open to feedback and constructive criticism in order to improve its overall quality and significance.

NAME: The name of the individual. GENDER: The gender of the individual, either male or female. EMAIL_ID: The email address of the individual. IS_GLOGIN: A boolean indicating whether the individual used Google login to register or not. FOLLOWER_COUNT: The number of followers the individual has. FOLLOWING_COUNT: The number of individuals the individual is following. DATASET_COUNT: The number of datasets the individual has created. CODE_COUNT: The number of notebooks the individual has created. DISCUSSION_COUNT: The number of discussions the individual has participated in. AVG_NB_READ_TIME_MIN: The average time spent reading notebooks in minutes. REGISTRATION_IPV4: The IP address used to register. REGISTRATION_LOCATION: The location from where the individual registered. TOTAL_VOTES_GAVE_NB: The total number of votes the individual has given to notebooks. TOTAL_VOTES_GAVE_DS: The total number of votes the individual has given to datasets. TOTAL_VOTES_GAVE_DC: The total number of votes the individual has given to discussion comments. ISBOT: A boolean indicating whether the individual is a bot or not.
R
Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection...
universe.roboflow.com
zip
Updated Jun 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ChawawiwatPractice (2024). Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset Dataset [Dataset]. https://universe.roboflow.com/chawawiwatpractice/humans-from-https-www.kaggle.com-datasets-constantinwerner-human-detection-dataset-cewfm
Explore at:
zipAvailable download formats
Dataset updated
Jun 20, 2024
Dataset authored and provided by
ChawawiwatPractice
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Human Bounding Boxes
Description
Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset

## Overview Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset is a dataset for object detection tasks - it contains Human annotations for 548 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Kaggle account verification
kaggle.com
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Ali (2022). Kaggle account verification [Dataset]. https://www.kaggle.com/datasets/ahmedali058/kaggle-account-verification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Ali
Description
Dataset

This dataset was created by Ahmed Ali

Contents
Data from: Password Reset Dataset
kaggle.com
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HariSellowpay (2023). Password Reset Dataset [Dataset]. https://www.kaggle.com/datasets/harisellowpay/password-reset-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HariSellowpay
Description
The dataset is designed to simulate password-related events, creating a synthetic representation of actions related to password management. It includes fields like timestamp, action, event type, location, IP address, password, hour, and time difference.

The dataset comprises 50,000 records representing a variety of password-related events.

A list of commonly used passwords is incorporated to mimic real-world scenarios.

Timestamps are spread throughout the current year.

Features like 'hour' and 'time_difference' are derived to provide additional insights into the temporal aspects of the events.

This synthetic dataset can be used for training and testing machine learning models related to cyber security, anomaly detection, or password management. It allows researchers and practitioners to experiment with data resembling real-world scenarios without compromising actual user information.
4367x PII Label-Specific Essays (by 7b Models)
kaggle.com
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Werner (2024). 4367x PII Label-Specific Essays (by 7b Models) [Dataset]. https://www.kaggle.com/datasets/valentinwerner/pii-label-specific-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Valentin Werner
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Evaluation of my dataset with my .915 baseline:

F5 score = .690 - Recall = .692, Precision = .639

Distribution of data:

843x Address (ca. 500 US)

496x Names (Incl. Middle Names, Pronounciation or Nicknames)

537x Userid

704x Username (Incl. Name)

531x Phone

755x Email (Incl. Name)

501x URL

See linked notebook for generation.

Remarks on labels:

EMAIL:

Email is always based on name, but random domains

Prompt was to also write about their favourite book, they are heavily favouring “to kill a mockingbird”

PHONE:

Generated from multiple countries for diversity

Labelling of phone numbers should only include the full number (not parts of it)

ADDRESSES:

From multiple countries for diversity

For US Addresses, State abbreviations are mapped to full name, so these are labeled as well

Addresses are only labelled as such if it starts with either of the first two words of the full address (e.g., if house number misses for us address, it is still labelled)

NAMES:

Middle names are sometimes generated, either separeted with " " or "-"

Pronounciations and nicknames were generated and labelled

However, “t’oma” as in my name Thomas is derived from the arameic word “t’oma” was not tagged. Let me know if this is wrong. They are relatively easy to identify in the names dataset by looking for “derived from”

URL:

Short domains, full websites and full URIs

USERID:

Mostly random generated string, number combination - not oriented on other formats

Can mostly easily be augmented by replacing the userid

Userid is sometimes split in text into parts - these splits are not labelled (not sure if this is right)

USERNAMES:

either generated based on name OR animal+birthyear OR colour+fruit
Data Export Tool
kaggle.com
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira gibin (2024). Data Export Tool [Dataset]. http://doi.org/10.34740/kaggle/dsv/10002590
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10002590
Dataset updated
Nov 24, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
willian oliveira gibin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
this graph was created in R :

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F418952a3857f2530a53a40d9cc9c320c%2Fgraph1.gif?generation=1732477206118972&alt=media" alt="">

Due to the size of the full dataset (see Technical Notices below for more information), users are advised to download data for specific time periods and/or geographic areas.

To download all available ACLED data for a specific time period, enter your login information, select a date range in the ‘from’ and ‘to’ boxes, and click ‘export.’ To download all available ACLED data for a specific region, country, or location enter your login information, select a ‘region,’ ‘country,’ or ‘location’ from the relevant drop-down menus, and click ‘export.’ Note: ‘country’ selection will override ‘region’ selection, and only data for the selected country or countries will be downloaded. ‘Location’ selection requires a ‘country’ selection, and will result in an export of only data for that specific subnational location.

To download data for specific event types, select the relevant event types from that category in the ‘event type’ or ‘sub-event type’ boxes and leave all other categories as they are. All data for the selected event type(s) will be exported.

To download data for a specific actor type or a specific actor, select the ‘actor type’ or ‘actor’ in the relevant boxes and leave all other categories as they are. All data for the selected actor or actor type(s) or actor will be exported.

By default, the data are exported in a format where each row represents a single event, on a specific day and location, and involving distinct actors. An ‘actor based’ file displays events by single actors instead, meaning that events are often repeated if two actors are involved. To determine which of the two file types to use, you should consider whether the data are being used to analyze patterns over time, types of violence, conflict between groups, or locations (which the default file type is best for), or to analyze actor types or specific actors. For the former, the default format should be used, while for the latter, the ‘actor based’ file should be used.

For systems that use semi-colon separated values by default, you may wish to use the ‘compatibility mode’ option.
Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
Data from: Spam Email
kaggle.com
Updated Feb 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhitaza Jana (2022). Spam Email [Dataset]. https://www.kaggle.com/datasets/rhitazajana/spam-email
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rhitaza Jana
Description
Dataset

This dataset was created by Rhitaza Jana

Contents
Google Analytics Sample
kaggle.com
zip
Updated Sep 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Google Analytics Sample [Dataset]. https://www.kaggle.com/datasets/bigquery/google-analytics-sample
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Sep 19, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website.

Content

The sample dataset contains Google Analytics 360 data from the Google Merchandise Store, a real ecommerce store. The Google Merchandise Store sells Google branded merchandise. The data is typical of what you would see for an ecommerce website. It includes the following kinds of information:

Traffic source data: information about where website visitors originate. This includes data about organic traffic, paid search traffic, display traffic, etc. Content data: information about the behavior of users on the site. This includes the URLs of pages that visitors look at, how they interact with content, etc. Transactional data: information about the transactions that occur on the Google Merchandise Store website.

Fork this kernel to get started.

Acknowledgements

Data from: https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Banner Photo by Edho Pratama from Unsplash.

Inspiration

What is the total number of transactions generated per device browser in July 2017?

The real bounce rate is defined as the percentage of visits with a single pageview. What was the real bounce rate per traffic source?

What was the average number of product pageviews for users who made a purchase in July 2017?

What was the average number of product pageviews for users who did not make a purchase in July 2017?

What was the average total transactions per user that made a purchase in July 2017?

What is the average amount of money spent per session in July 2017?

What is the sequence of pages viewed?
Top-ranked kaggler DAILY user activity (updated)
kaggle.com
Updated Jul 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
piby4 (2020). Top-ranked kaggler DAILY user activity (updated) [Dataset]. https://www.kaggle.com/tomtillo/top-ranked-kaggle-user-activity-1-1000-ranks/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
piby4
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
LAST UPDATED : 20th JULY 2020

Context

Do the top Kagglers comment more ??

Do they do the competition submissions mostly during weekends ?

Who are the most active kagglers from the top-ranked users ?

A user activity is defined as

Making a competition submission

Running a script

Commenting on a topic

Creating a new dataset / updating one.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F285393%2F76ddd60b7a0afd22fadf3ed21510d52b%2Factivity_map.png?generation=1595260268658485&alt=media" alt="">

Content

This dataset consists of 4 sub-datasets **USER_ACTIVITY.csv ** Contains the user activity on a day-username level - submissions - comments - script runs - dataset updates

competitions_1000_ranks.csv Top 1000 ranked kagglers ( competitions ) username - rank

discussion_top1000_ranks.csv Top 1000 ranked kagglers ( discussions) username - rank

scripts_top1000_ranks.csv Top 1000 ranked kagglers ( kernels ) username - rank

userid_username_mapping.csv "kaggle id - kaggle username mapping file

Frequency of Update

This dataset will be updated every Monday

Acknowledgements

The main USER_ACTIVITY data set has been acquired from the kaggle's user activity tab ( from the user's home page ) Also other meta has been acquired from metakaggle ( public dataset)

Inspiration

Do the top kagglers show some pattern in they submissions, comments , dataset updates or script runs ???
Fake News Prediction Dataset
kaggle.com
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajat Kumar (2023). Fake News Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/rajatkumar30/fake-news
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rajat Kumar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
** Please Upvote if you like the dataset **

Fake news or hoax news is false or misleading information presented as news. Fake news often has the aim of damaging the reputation of a person or entity, or making money through advertising revenue.

This dataset is having Both Fake and Real news.

The columns present in the dataset are:-

1) Title -> Title of the News

2) Text -> Text or Content of the News

3) Label -> Labelling the news as Fake or Real
ranked_users_kaggle_data
kaggle.com
Updated Nov 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FelipeSalvatore (2018). ranked_users_kaggle_data [Dataset]. https://www.kaggle.com/felsal/ranked-users-kaggle-data/
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
FelipeSalvatore
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Ranked users Kaggle data

Data about Kaggle ranked users

Context

This data is available online here. I image it was obtained by a crawler since it is displayed on the Kaggle leader board. I took the data and standardize the country names and add a continent label to each user, but I did not use the city name. To preserve anonymity I removed the columns UserName and DisplayName from the original dataset.

Content

Each row represent a ranked user. The columns are: register date, current points, current ranking, highest ranking, country and continent.

In Kaggle, points and ranking change over time. So, all the positions represented here correspond only to a specific point in time (around August 2018).

Acknowledgements

I want to thank the team from Norconsult responsible to make this data public.
Bank Transaction Dataset for Fraud Detection
kaggle.com
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vala khorasani (2024). Bank Transaction Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
vala khorasani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset provides a detailed look into transactional behavior and financial activity patterns, ideal for exploring fraud detection and anomaly identification. It contains 2,512 samples of transaction data, covering various transaction attributes, customer demographics, and usage patterns. Each entry offers comprehensive insights into transaction behavior, enabling analysis for financial security and fraud detection applications.

Key Features:

TransactionID: Unique alphanumeric identifier for each transaction.

AccountID: Unique identifier for each account, with multiple transactions per account.

TransactionAmount: Monetary value of each transaction, ranging from small everyday expenses to larger purchases.

TransactionDate: Timestamp of each transaction, capturing date and time.

TransactionType: Categorical field indicating 'Credit' or 'Debit' transactions.

Location: Geographic location of the transaction, represented by U.S. city names.

DeviceID: Alphanumeric identifier for devices used to perform the transaction.

IP Address: IPv4 address associated with the transaction, with occasional changes for some accounts.

MerchantID: Unique identifier for merchants, showing preferred and outlier merchants for each account.

AccountBalance: Balance in the account post-transaction, with logical correlations based on transaction type and amount.

PreviousTransactionDate: Timestamp of the last transaction for the account, aiding in calculating transaction frequency.

Channel: Channel through which the transaction was performed (e.g., Online, ATM, Branch).

CustomerAge: Age of the account holder, with logical groupings based on occupation.

CustomerOccupation: Occupation of the account holder (e.g., Doctor, Engineer, Student, Retired), reflecting income patterns.

TransactionDuration: Duration of the transaction in seconds, varying by transaction type.

LoginAttempts: Number of login attempts before the transaction, with higher values indicating potential anomalies.

This dataset is ideal for data scientists, financial analysts, and researchers looking to analyze transactional patterns, detect fraud, and build predictive models for financial security applications. The dataset was designed for machine learning and pattern analysis tasks and is not intended as a primary data source for academic publications.
Network Traffic Dataset
kaggle.com
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravikumar Gattu (2023). Network Traffic Dataset [Dataset]. https://www.kaggle.com/datasets/ravikumargattu/network-traffic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 31, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravikumar Gattu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The data presented here was obtained in a Kali Machine from University of Cincinnati,Cincinnati,OHIO by carrying out packet captures for 1 hour during the evening on Oct 9th,2023 using Wireshark.This dataset consists of 394137 instances were obtained and stored in a CSV (Comma Separated Values) file.This large dataset could be used utilised for different machine learning applications for instance classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

The dataset can be used for a variety of machine learning tasks, such as network intrusion detection, traffic classification, and anomaly detection.

Content :

This network traffic dataset consists of 7 features.Each instance contains the information of source and destination IP addresses, The majority of the properties are numeric in nature, however there are also nominal and date kinds due to the Timestamp.

The network traffic flow statistics (No. Time Source Destination Protocol Length Info) were obtained using Wireshark (https://www.wireshark.org/).

Dataset Columns:

No : Number of Instance. Timestamp : Timestamp of instance of network traffic Source IP: IP address of Source Destination IP: IP address of Destination Portocol: Protocol used by the instance Length: Length of Instance Info: Information of Traffic Instance

Acknowledgements :

I would like thank University of Cincinnati for giving the infrastructure for generation of network traffic data set.

Ravikumar Gattu , Susmitha Choppadandi

Inspiration : This dataset goes beyond the majority of network traffic classification datasets, which only identify the type of application (WWW, DNS, ICMP,ARP,RARP) that an IP flow contains. Instead, it generates machine learning models that can identify specific applications (like Tiktok,Wikipedia,Instagram,Youtube,Websites,Blogs etc.) from IP flow statistics (there are currently 25 applications in total).

**Dataset License: ** CC0: Public Domain

Dataset Usages : This dataset can be used for different machine learning applications in the field of cybersecurity such as classification of Network traffic,Network performance monitoring,Network Security Management , Network Traffic Management ,network intrusion detection and anomaly detection.

ML techniques benefits from this Dataset :

This dataset is highly useful because it consists of 394137 instances of network traffic data obtained by using the 25 applications on a public,private and Enterprise networks.Also,the dataset consists of very important features that can be used for most of the applications of Machine learning in cybersecurity.Here are few of the potential machine learning applications that could be benefited from this dataset are :

Network Performance Monitoring : This large network traffic data set can be utilised for analysing the network traffic to identifying the network patterns in the network .This help in designing the network security algorithms for minimise the network probelms.

Anamoly Detection : Large network traffic dataset can be utilised training the machine learning models for finding the irregularitues in the traffic which could help identify the cyber attacks.

3.Network Intrusion Detection : This large dataset could be utilised for machine algorithms training and designing the models for detection of the traffic issues,Malicious traffic network attacks and DOS attacks as well.
Predicting Heart Failure
kaggle.com
Updated Sep 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Chauhan (2022). Predicting Heart Failure [Dataset]. https://www.kaggle.com/datasets/whenamancodes/heart-failure-clinical-records
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aman Chauhan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide. Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

Attribute Information:

Thirteen (13) clinical features: - age: age of the patient (years) - anaemia: decrease of red blood cells or hemoglobin (boolean) - high blood pressure: if the patient has hypertension (boolean) - creatinine phosphokinase (CPK): level of the CPK enzyme in the blood (mcg/L) - diabetes: if the patient has diabetes (boolean) - ejection fraction: percentage of blood leaving the heart at each contraction (percentage) - platelets: platelets in the blood (kiloplatelets/mL) - sex: woman or man (binary) - serum creatinine: level of serum creatinine in the blood (mg/dL) - serum sodium: level of serum sodium in the blood (mEq/L) - smoking: if the patient smokes or not (boolean) - time: follow-up period (days) - [target] death event: if the patient deceased during the follow-up period (boolean)

More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Haha
Online Sales Dataset - Popular Marketplace Data
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ShreyanshVerma27 (2024). Online Sales Dataset - Popular Marketplace Data [Dataset]. https://www.kaggle.com/datasets/shreyanshverma27/online-sales-dataset-popular-marketplace-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ShreyanshVerma27
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.

Columns:

Order ID: Unique identifier for each sales order.

Date:Date of the sales transaction.

Category:Broad category of the product sold (e.g., Electronics, Home Appliances, Clothing, Books, Beauty Products, Sports).

Product Name:Specific name or model of the product sold.

Quantity:Number of units of the product sold in the transaction.

Unit Price:Price of one unit of the product.

Total Price: Total revenue generated from the sales transaction (Quantity * Unit Price).

Region:Geographic region where the transaction occurred (e.g., North America, Europe, Asia).

Payment Method: Method used for payment (e.g., Credit Card, PayPal, Debit Card).

Insights:

1. Analyze sales trends over time to identify seasonal patterns or growth opportunities.

2. Explore the popularity of different product categories across regions.

3. Investigate the impact of payment methods on sales volume or revenue.

4. Identify top-selling products within each category to optimize inventory and marketing strategies.

5. Evaluate the performance of specific products or categories in different regions to tailor marketing campaigns accordingly.
Moodle grades and action logs
kaggle.com
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martins Sneiders (2025). Moodle grades and action logs [Dataset]. https://www.kaggle.com/datasets/martinssneiders/moodle-grades-and-action-logs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Martins Sneiders
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset for publication "Comparative analysis of time series models for student data in the Moodle platform".
Equity in Healthcare Clean DataSets
kaggle.com
Updated Feb 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anopsy (2024). Equity in Healthcare Clean DataSets [Dataset]. https://www.kaggle.com/datasets/anopsy/equity-in-healthcare-clean-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anopsy
Description
This dataset is based on train and test dataset from this competition: https://www.kaggle.com/competitions/widsdatathon2024-challenge1 .

What did I change? 1. I dropped 2 columns that contained to little data.
2. using Machine Learning I imputed "payer_type", "patient_race" and "bmi". 3. using "patient_zip3" I filled missing values in "patient_state" , "Region" and "Division" 4. using SinmpleImputer I imputed few missing numeric data in "Ozone", "PM2.5" and other columns 5. I created some new features, based on demographic features, that may be a bit more informative. 6. I tokenized the 'breast_cancer_diagnosis_desc' column

If you're interested how I did that check those notebooks: https://www.kaggle.com/code/anopsy/ml-for-missing-values for "bmi" and new features check this: https://www.kaggle.com/code/anopsy/fe-and-xgb-on-clean-data

According to the description of the original dataset, it's a "39k record dataset (split into training and test sets) representing patients and their characteristics (age, race, BMI, zip code), their diagnosis and treatment information (breast cancer diagnosis code, metastatic cancer diagnosis code, metastatic cancer treatments, … etc.), their geo (zip-code level) demographic data (income, education, rent, race, poverty, …etc), as well as toxic air quality data (Ozone, PM25 and NO2)."
Kaggle Datasets Ranking
kaggle.com
Updated Jan 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivo Vinco (2022). Kaggle Datasets Ranking [Dataset]. https://www.kaggle.com/datasets/vivovinco/kaggle-datasets-ranking/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vivo Vinco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

This dataset contains Kaggle ranking of datasets.

Content

+800 rows and 8 columns. Columns' description are listed below.

Rank : Rank of the user

Tier : Grandmaster, Master or Expert

Username : Name of the user

Join Date : Year of join

Gold Medals : Number of gold medals

Silver Medals : Number of silver medals

Bronze Medals : Number of bronze medals

Points : Total points

Acknowledgements

Data from Kaggle. Image from The Guardian.

If you're reading this, please upvote.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jean_oliveirasi (2023). password and username generator [Dataset]. https://www.kaggle.com/datasets/jeanoliveirasi/password-and-username-generator/code

password and username generator

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 22, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Jean_oliveirasi

Description

Dataset

This dataset was created by Jean_oliveirasi

Clear search

Close search

Google apps

Main menu

password and username generator

Dataset

Contents

Kaggle Bot Account Detection

Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection...

Humans From Https Www.kaggle.com Datasets Constantinwerner Human Detection Dataset

Kaggle account verification

Dataset

Contents

Data from: Password Reset Dataset

4367x PII Label-Specific Essays (by 7b Models)

Evaluation of my dataset with my .915 baseline:

Distribution of data:

See linked notebook for generation.

Remarks on labels:

EMAIL:

PHONE:

ADDRESSES:

NAMES:

URL:

USERID:

USERNAMES:

Data Export Tool

Student Performance Data Set

Data from: Spam Email

Dataset

Contents

Google Analytics Sample

Context

Content

Acknowledgements

Inspiration

Top-ranked kaggler DAILY user activity (updated)

LAST UPDATED : 20th JULY 2020

Context

Content

Frequency of Update

Acknowledgements

Inspiration

Fake News Prediction Dataset

ranked_users_kaggle_data

Ranked users Kaggle data

Context

Content

Acknowledgements

Bank Transaction Dataset for Fraud Detection

Network Traffic Dataset

Predicting Heart Failure

Attribute Information:

Online Sales Dataset - Popular Marketplace Data

Columns:

Insights:

Moodle grades and action logs

Equity in Healthcare Clean DataSets

Kaggle Datasets Ranking

Context

Content

Acknowledgements

password and username generator

Dataset

Contents