100+ datasets found

Retail Credit Bank Data
kaggle.com
Updated Sep 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SR (2021). Retail Credit Bank Data [Dataset]. https://www.kaggle.com/datasets/surekharamireddy/credit-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2021
Dataset provided by
Kaggle
Authors
SR
Description
Context

A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.

Content

Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems
f
Comparison results of different model.
plos.figshare.com
xls
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Comparison results of different model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289724.t006
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Predictive Maintenance Dataset
kaggle.com
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
h
kaggle-entity-annotated-corpus-ner-dataset
huggingface.co
Updated Jul 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Arias Calles (2022). kaggle-entity-annotated-corpus-ner-dataset [Dataset]. https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 10, 2022
Authors
Rafael Arias Calles
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Date: 2022-07-10 Files: ner_dataset.csv Source: Kaggle entity annotated corpus notes: The dataset only contains the tokens and ner tag labels. Labels are uppercase.

About Dataset

from Kaggle Datasets

Context

Annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set. Tip: Use Pandas Dataframe to load dataset if using Python for… See the full description on the dataset page: https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
f
Details of feature variables of the data set.
plos.figshare.com
xls
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289724.t002
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
u
Pinterest Fashion Compatibility
cseweb.ucsd.edu
beta.data.urbandatacentre.ca
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

Metadata includes

product IDs

bounding boxes

Basic Statistics:

Scenes: 47,739

Products: 38,111

Scene-Product Pairs: 93,274
P
KaggleDBQA Dataset
paperswithcode.com
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chia-Hsuan Lee; Oleksandr Polozov; Matthew Richardson (2025). KaggleDBQA Dataset [Dataset]. https://paperswithcode.com/dataset/kaggledbqa
Explore at:
Dataset updated
Jan 20, 2025
Authors
Chia-Hsuan Lee; Oleksandr Polozov; Matthew Richardson
Description
KaggleDBQA is a challenging cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions.

It expands upon contemporary cross-domain text-to-SQL datasets in three key aspects: (1) Its databases are pulled from real-world data sources and not normalized. (2) Its questions are authored in environments that mimic natural question answering. (3) It also provides database documentation that contains rich in-domain knowledge.
LinkedIn Datasets
brightdata.com
.json, .csv, .xlsx
Updated Dec 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 17, 2021
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
d
Yellowstone Sample Collection - database
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Yellowstone Sample Collection - database [Dataset]. https://catalog.data.gov/dataset/yellowstone-sample-collection-database
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This database was prepared using a combination of materials that include aerial photographs, topographic maps (1:24,000 and 1:250,000), field notes, and a sample catalog. Our goal was to translate sample collection site locations at Yellowstone National Park and surrounding areas into a GIS database. This was achieved by transferring site locations from aerial photographs and topographic maps into layers in ArcMap. Each field site is located based on field notes describing where a sample was collected. Locations were marked on the photograph or topographic map by a pinhole or dot, respectively, with the corresponding station or site numbers. Station and site numbers were then referenced in the notes to determine the appropriate prefix for the station. Each point on the aerial photograph or topographic map was relocated on the screen in ArcMap, on a digital topographic map, or an aerial photograph. Several samples are present in the field notes and in the catalog but do not correspond to an aerial photograph or could not be found on the topographic maps. These samples are marked with “No” under the LocationFound field and do not have a corresponding point in the SampleSites feature class. Each point represents a field station or collection site with information that was entered into an attributes table (explained in detail in the entity and attribute metadata sections). Tabular information on hand samples, thin sections, and mineral separates were entered by hand. The Samples table includes everything transferred from the paper records and relates to the other tables using the SampleID and to the SampleSites feature class using the SampleSite field.
d
SAMPLE DATASET
staging-elsevier.digitalcommonsdata.com
Updated Jul 10, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FirstName+36125284 LastName+36125284 (2019). SAMPLE DATASET [Dataset]. http://doi.org/10.1234/tgpfnk7zyt.19
Explore at:
Unique identifier
https://doi.org/10.1234/tgpfnk7zyt.19
Dataset updated
Jul 10, 2019
Authors
FirstName+36125284 LastName+36125284
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.
Student Performance Data Set
kaggle.com
Updated Mar 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Data-Science Sean
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).
A
‘🍐 FDIC Failed Bank List’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🍐 FDIC Failed Bank List’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fdic-failed-bank-list-3aaf/9b764d5e/?iid=004-707&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘🍐 FDIC Failed Bank List’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/fdic-failed-bank-liste on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

The FDIC is often appointed as receiver for failed banks. This list includes banks which have failed since October 1, 2000.

Source: https://catalog.data.gov/dataset/fdic-failed-bank-list

This dataset was created by Finance and contains around 500 samples along with Acquiring Institution, Bank Name, technical information and other features such as: - Updated Date - St - and more.

How to use this dataset

Analyze Closing Date in relation to City

Study the influence of Acquiring Institution on Bank Name

More datasets

Acknowledgements

If you use this dataset in your research, please credit Finance

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
S
A sample dataset of coastal land cover including mangroves in southern China...
scidb.cn
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhao Chuanpeng; Qin Chengzhi (2020). A sample dataset of coastal land cover including mangroves in southern China [Dataset]. http://doi.org/10.11922/sciencedb.00279
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.00279
Dataset updated
Nov 9, 2020
Dataset provided by
Science Data Bank
Authors
Zhao Chuanpeng; Qin Chengzhi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Sample can drive classification algorithms, thus is a prerequisite for accurate classification. Coastal areas are located in the transitional zone between land and sea, requiring more samples to describe diverse land covers. However, there are scarce studies sharing their sample datasets, leading to a repeat of the time-consuming and laborious sampling procedure. To alleviate the problem, we share a sample set with a total of 16,444 sample points derived from a study of mapping mangroves of China. The sample set contains a total of 10 categories, which are described as follows. 1) The mangroves refer to “true mangroves” (excluding the associate mangrove species). In sampling mangroves, we used the data from the China Mangrove Conservation Network (CMCN, http://www.china-mangrove.org/), a non-governmental organization aiming to promote mangrove ecosystems. The CMCN provides an interactive map that can be annotated by volunteers with text or photos to record mangrove status at a location. Although the locations were shifted due to coordinate system differences and positioning errors, mangroves could be found around the mangrove locations depicted by the CMCN’s map on Google Earth images. There is a total of 1887 mangrove samples. 2) The cropland is dominated by paddy rice. We collected a total 1383 points according to its neat arrangement based on Google Earth images. 3) Coastal forests neighboring mangroves are mostly salt-tolerant, such as Cocos nucifera Linn., Hibiscus tiliaceus Linn., and Cerbera manghas Linn. We collected a total 1158 samples according to their distance to the shoreline based on Google Earth images. 4) Terrestrial forests are forests far from the shoreline, and are intolerant to salt. By visual inspection on Google Earth, we sampled 1269 points based on their appearances and distances to the shoreline. 5) For the grass category, we collected 1282 samples by visual judgement on Google Earth. 6) Saltmarsh, dominated by Spartina alterniﬂora, covering large areas of tidal ﬂats in China. We collected 2065 samples according to Google Earth images. 7) The tidal ﬂats category was represented by 1517 samples, which were sampled using the most recent global tidal ﬂat map for 2014–2016 and were visually corrected. 8) The “sand or rock” category refers to sandy and pebble beaches or rocky coasts exposed to air, which are not habitats of mangroves. We collected 1622 samples on Google Earth based on visual inspection. 9) For the permanent water category, samples were first randomly sampled from a threshold result of NDWI (> 0.2), and then were visually corrected. A total of 2056 samples were obtained. 10) As to the artificial impervious surfaces category, we randomly sampled from a threshold result corresponding to normal difference built-up index (NDBI) (> 0.1), and corrected them based on Google Earth. The artificial impervious surface category was represented by 2205 samples. This sample dataset covers the low-altitude coastal area of five Provinces (Hainan, Guangdong, Fujian, Zhejiang, and Taiwan), one Autonomous region (Guangxi), and two Special Administrative Regions (Macau and Hong Kong) (see “study_area.shp” in the zip for details). It can be used to train models for coastal land cover classification, and to evaluate classification results. In addition to mangroves, it can also be used in identifying tidal flats, mapping salt marsh, extracting water bodies, and other related applications.Compared with the V1 version, we added a validation dataset for mangrove maps (Mangrove map validation dataset.rar), and thus can evaluate mangrove maps under the same dataset, which benefit the comparison of different mangrove maps. The validation dataset contains 10 shp files, in which each shp file contains 600 mangrove samples (cls_new field = 1) and 600 non-mangrove samples (cls_new field = 0).Compared with the V2 version, we added two classes of forest near water and grass near water, in addition to suppress the prevalent misclassified patches due to the spectral similarity between mangroves and those classes.
d
Dataset Sample - for User Testing
staging-elsevier.digitalcommonsdata.com
Updated Jun 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patricia Sawamura (2020). Dataset Sample - for User Testing [Dataset]. http://doi.org/10.1234/785hy79b49.1
Explore at:
Unique identifier
https://doi.org/10.1234/785hy79b49.1
Dataset updated
Jun 26, 2020
Authors
Patricia Sawamura
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here you can describe your dataset (3000 characters)

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur. Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur.

At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat
d
Biological Samples Database (BSD)
catalog.data.gov
fisheries.noaa.gov
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact, Custodian) (2025). Biological Samples Database (BSD) [Dataset]. https://catalog.data.gov/dataset/biological-samples-database-bsd
Explore at:
Dataset updated
Jun 1, 2025
Dataset provided by
(Point of Contact, Custodian)
Description
The Biological Sampling Database (BSD) is an Oracle relational database that is maintained at the NMFS Panama City Laboratory and NOAA NMFS Beaufort Laboratory. Data set includes port samples of reef fish species collected from commercial and recreational fishery landings in the U.S. South Atlantic (NC - FL Keys). The data set serves as an inventory of samples stored at the NMFS Beaufort Laboratory as well as final processed data. Information that may be inlcuded for each sample is trip level information, species, size meansurements, age, sex and reproductive data.
d
Data Management Plan Examples Database
search.dataone.org
borealisdata.ca
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Sep 4, 2024
Dataset provided by
Borealis
Authors
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
Time period covered
Jan 1, 2011 - Jan 1, 2023
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
h
dataset-card-example
huggingface.co
Updated Sep 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Templates (2023). dataset-card-example [Dataset]. https://huggingface.co/datasets/templates/dataset-card-example
Explore at:
Dataset updated
Sep 28, 2023
Dataset authored and provided by
Templates
Description
Dataset Card for Dataset Name

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Details Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/templates/dataset-card-example.
P
BANKING77 Dataset
paperswithcode.com
library.toponeai.link
Updated Oct 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iñigo Casanueva; Tadas Temčinas; Daniela Gerz; Matthew Henderson; Ivan Vulić (2024). BANKING77 Dataset [Dataset]. https://paperswithcode.com/dataset/banking77
Explore at:
Dataset updated
Oct 6, 2024
Authors
Iñigo Casanueva; Tadas Temčinas; Daniela Gerz; Matthew Henderson; Ivan Vulić
Description
Dataset composed of online banking queries annotated with their corresponding intents.

BANKING77 dataset provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection.
d
SAMPLE DATASET - staging
staging-elsevier.digitalcommonsdata.com
Updated Sep 30, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FirstName+36125284 LastName+36125284 (2019). SAMPLE DATASET - staging [Dataset]. http://doi.org/10.1234/tgpfnk7zyt.36
Explore at:
Unique identifier
https://doi.org/10.1234/tgpfnk7zyt.36
Dataset updated
Sep 30, 2019
Authors
FirstName+36125284 LastName+36125284
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version. .. This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.

This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version. This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.

Facebook

Twitter

Click to copy link

Link copied

Cite

SR (2021). Retail Credit Bank Data [Dataset]. https://www.kaggle.com/datasets/surekharamireddy/credit-data

Retail Credit Bank Data

Identifying risk in retail credit data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 10, 2021

Dataset provided by

Kaggle

Authors

Description

Context

A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.

Content

Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems

Clear search

Close search

Google apps

Main menu

Retail Credit Bank Data

Context

Content

Comparison results of different model.

Predictive Maintenance Dataset

kaggle-entity-annotated-corpus-ner-dataset

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Details of feature variables of the data set.

Pinterest Fashion Compatibility

KaggleDBQA Dataset

LinkedIn Datasets

Yellowstone Sample Collection - database

SAMPLE DATASET

Student Performance Data Set

‘🍐 FDIC Failed Bank List’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

A sample dataset of coastal land cover including mangroves in southern China...

Dataset Sample - for User Testing

Biological Samples Database (BSD)

Data Management Plan Examples Database

dataset-card-example

BANKING77 Dataset

SAMPLE DATASET - staging

Retail Credit Bank Data

Identifying risk in retail credit data

Context

Content