Facebook
TwitterAbout Dataset ● Based on patient symptoms, identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate. ● Three individual datasets used for three urgent illness/injury, each dataset has its own features and symptoms for each patient and we merged them to know what are the most severe symptoms for each illness and give them priority of treatment.
PROJECT SUMMARY Triage refers to the sorting of injured or sick people according to their need for emergency medical attention. It is a method of determining priority for who gets care first. BACKGROUND Triage is the prioritization of patient care (or victims during a disaster) based on illness/injury, symptoms, severity, prognosis, and resource availability. The purpose of triage is to identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate. BUSINESS CHALLENGE Based on patient symptoms, identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Customer Personality Analysis involves a thorough examination of a company's optimal customer profiles. This analysis facilitates a deeper understanding of customers, enabling businesses to tailor products to meet the distinct needs, behaviors, and concerns of various customer types.
By conducting a Customer Personality Analysis, businesses can refine their products based on the preferences of specific customer segments. Rather than allocating resources to market a new product to the entire customer database, companies can identify the segments most likely to be interested in the product. Subsequently, targeted marketing efforts can be directed toward those particular segments, optimizing resource utilization and increasing the likelihood of successful product adoption.
Details of Features are as below:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://i.imgur.com/ZUX61cD.png" alt="Overview">
The method of disuniting similar data is called clustering. you can create dummy data for classifying clusters by method from sklearn package but it needs to put your effort into job.
For users who making hard test cases for example of clustering, I think this dataset helps them.
Try out to select a meaningful number of clusters, and dividing the data into clusters. Here are exercises for you.
All csv files contain a lots of x, y and color, and you can see above figures.
If you want to use position as type of integer, scale it and round off to integer as like x = round(x * 100).
Furthermore, here is GUI Tool to generate 2D points for clustering. you can make your dataset with this tool. https://www.joonas.io/cluster-paint
Stay tuned for further updates! also if any idea, you can comment me.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset collection comprises 15 diverse two-dimensional datasets specifically designed for clustering analysis. Each dataset contains three columns: x, y, and target, where x and y represent the coordinates of the data points, and target indicates the cluster label.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20292402%2F3cc81328beabc815fe500973fee1f7ac%2Fdescription.png?generation=1737484616903723&alt=media" alt="Visualisation of data">
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The datasets are generated using either Gaussian or Uniform distributions. Each dataset contains several known sub-groups intended for testing centroid-based clustering results and cluster validity indices.
Cluster analysis is a popular machine learning used for segmenting datasets with similar data points in the same group. For those who are familiar with R, there is a new R package called "UniversalCVI" https://CRAN.R-project.org/package=UniversalCVI used for cluster evaluation. This package provides algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). To use the "UniversalCVI" package, one can follow the instructions provided in the R documentation.
For more in-depth details of the package and cluster evaluation, please see the papers https://doi.org/10.1016/j.patcog.2023.109910 and https://arxiv.org/abs/2308.14785
https://github.com/O-PREEDASAWAKUL/FuzzyDatasets.git .
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17645646%2Fa2f87fbad212a908718535589681a703%2Frealplot.jpeg?generation=1700111724010268&alt=media" alt="">
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Is it possible to cluster all the photos in your phone automatically without labeling?
This small dataset includes 80 photos of dogs (10), cats (10), family (20), alone (20), and food (20). There is no labeling info, but you will see it clearly.
All the photos were from pixabay(https://pixabay.com/). They are free under some restrictions. please see the license page of pixabay (https://pixabay.com/ko/service/license/).
Facebook
TwitterThis dataset was created by Deepansh Saxena1
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset explores the relationship between age, mobile usage hours, and income levels. It contains four columns:
The dataset is small and ideal for beginners to practice hierarchical clustering techniques. It provides insights into how mobile usage and income vary across different age groups, making it suitable for educational and learning purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ridlo Wahyudi Wibowo
Released under CC0: Public Domain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains a wealth of information that can be used to explore the effectiveness of various clustering algorithms. With its inclusion of numerical measurements (X, Y, Sepal.Length, and Petal.Length) and categorical values (Species), it is possible to investigate the relationship between different types of variables and clustering performance. Additionally, by comparing results for the 3 datasets provided - moon.csv (which contains x and y coordinates), iris.csv (which contains measurements for sepal and petal lengths),and circles.csv - we can gain insights into how different data distributions affect clustering techniques such as K-Means or Hierarchical Clustering among others!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can also be a great starting point to further explore more complex clusters by using higher dimensional space variables such as color or texture that may be present in other datasets not included here but which can help to form more accurate groups when using cluster-analysis algorithms. Additionally, it could also assist in visualization projects where clusters may need to be generated such as plotting mapped data points or examining relationships between two different variables within a certain region drawn on a chart.
To use this dataset effectively it is important to understand how exactly your chosen algorithm works since some require specifying parameters beforehand while others take care of those details automatically; otherwise the interpretation may be invalid depending on the methods used alongside clustering you intend for your project. Furthermore, familiarize yourself with concepts like silhouette score and rand index - these are commonly used metrics that measure your cluster’s performance against other clusterings models so you know if what you have done so far satisfies an acceptable level of accuracy or not yet! Good luck!
- Utilizing the sepal and petal lengths and widths to perform flower recognition or part of a larger image recognition pipeline.
- Classifying the data points in each dataset by the X-Y coordinates using clustering algorithms to analyze galaxy locations or overall formation patterns for stars, planets, or galaxies.
- Exploring correlations between species of flowers in terms of sepal/petal lengths by performing supervised learning tasks such as classification with this dataset
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: moon.csv | Column name | Description | |:--------------|:------------------------------------------| | X | X coordinate of the data point. (Numeric) | | Y | Y coordinate of the data point. (Numeric) |
File: iris.csv | Column name | Description | |:-----------------|:---------------------------------------------| | Sepal.Length | Length of the sepal of the flower. (Numeric) | | Petal.Length | Length of the petal of the flower. (Numeric) | | Species | Species of the flower. (Categorical) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Facebook
TwitterThis dataset was created by Syed Touqeer
Facebook
TwitterThis is text document classification dataset which contains 2225 text data and five categories of documents. Five categories are politics, sport, tech, entertainment and business. We can use this dataset for documents classification and document clustering.
About Dataset - Dataset contains two features text and label. - No. of Rows : 2225 - No. of Columns : 2
Text: It contains different categories of text data Label: It contains labels for five different categories : 0,1,2,3,4
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There are close to 200 000 datasets on Kaggle.
This dataset brings information on all of them to help navigate all the data. The file contains upvotes, links usability, and so on for every dataset hosted on Kaggle.
An example of usage is to study the ties between the dataset's features and its popularity.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Nakshatra Goswami
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Overview: Customer Segmentation Using K-Means Clustering
Introduction In this project, I analysed customer data from a retail store to identify distinct customer segments. The dataset includes key attributes such as age, city, and total sales of the customers. By leveraging K-Means clustering, an unsupervised machine learning technique, I aim to group customers based on their age and sales metrics. These insights will enable the creation of targeted marketing campaigns tailored to the specific needs and behaviours of each customer segment.
Objectives - Cluster Customers: Use K-Means clustering to group customers based on age and total sales. - Analyse Segments: Examine the characteristics of each customer segment. - Targeted Marketing: Develop strategies for personalized marketing campaigns targeting each identified customer group.
Data Description The dataset comprises:
Methodology - Data Preprocessing: Clean and preprocess the data to handle any missing or inconsistent entries. - Feature Selection: Focus on age and total sales as primary features for clustering. - K-Means Clustering: Apply the K-Means algorithm to identify distinct customer segments. - Cluster Analysis: Analyse the resulting clusters to understand the demographic and sales characteristics of each group. - Marketing Strategy Development: Create targeted marketing strategies for each customer segment to enhance engagement and sales.
Expected Outcomes - Customer Segments: Clear identification of customer groups based on age and purchasing behaviour. - Insights for Marketing: Detailed understanding of each segment to inform targeted marketing efforts. - Business Impact: Enhanced ability to tailor marketing campaigns, potentially leading to increased customer satisfaction and sales.
By clustering customers based on age and total sales, this project aims to provide actionable insights for personalized marketing, ultimately driving better customer engagement and higher sales for the retail store.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides 50 numbers of cartesian coordinates having 3 dimensions. The goal is to group these coordinates effectively into clusters. Application of any suitable Machine learning model is welcomed. Happy Learning!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Small 2 dimensional clustering dataset for examples and case studies.
Created using https://www.joonas.io/cluster-paint/
I used this in my introduction to k-Means clustering notebook here: https://www.kaggle.com/samuelcortinhas/k-means-from-scratch
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Click Mintaka
Released under CC0: Public Domain
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Beginner friendly dataset for clustering.
You can train a model to cluster customers in segments (High, Medium, Low) based on 'Avg_Order_Value' and 'Total_Spending'.
Actual segment is also provided.
Facebook
TwitterThis dataset was created by Tek Bahadur Kshetri
Facebook
TwitterAbout Dataset ● Based on patient symptoms, identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate. ● Three individual datasets used for three urgent illness/injury, each dataset has its own features and symptoms for each patient and we merged them to know what are the most severe symptoms for each illness and give them priority of treatment.
PROJECT SUMMARY Triage refers to the sorting of injured or sick people according to their need for emergency medical attention. It is a method of determining priority for who gets care first. BACKGROUND Triage is the prioritization of patient care (or victims during a disaster) based on illness/injury, symptoms, severity, prognosis, and resource availability. The purpose of triage is to identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate. BUSINESS CHALLENGE Based on patient symptoms, identify patients needing immediate resuscitation; to assign patients to a predesignated patient care area, thereby prioritizing their care; and to initiate diagnostic/therapeutic measures as appropriate.