100+ datasets found
  1. Retail Credit Bank Data

    • kaggle.com
    Updated Sep 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SR (2021). Retail Credit Bank Data [Dataset]. https://www.kaggle.com/datasets/surekharamireddy/credit-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 10, 2021
    Dataset provided by
    Kaggle
    Authors
    SR
    Description

    Context

    A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.

    Content

    Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems

  2. f

    Comparison results of different model.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison results of different model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  3. Predictive Maintenance Dataset

    • kaggle.com
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Agarwal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

    The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.

  4. h

    kaggle-entity-annotated-corpus-ner-dataset

    • huggingface.co
    Updated Jul 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Arias Calles (2022). kaggle-entity-annotated-corpus-ner-dataset [Dataset]. https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2022
    Authors
    Rafael Arias Calles
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Date: 2022-07-10 Files: ner_dataset.csv Source: Kaggle entity annotated corpus notes: The dataset only contains the tokens and ner tag labels. Labels are uppercase.

      About Dataset
    

    from Kaggle Datasets

      Context
    

    Annotated Corpus for Named Entity Recognition using GMB(Groningen Meaning Bank) corpus for entity classification with enhanced and popular features by Natural Language Processing applied to the data set. Tip: Use Pandas Dataframe to load dataset if using Python for… See the full description on the dataset page: https://huggingface.co/datasets/rjac/kaggle-entity-annotated-corpus-ner-dataset.

  5. Market Basket Analysis

    • kaggle.com
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  6. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  7. u

    Pinterest Fashion Compatibility

    • cseweb.ucsd.edu
    • beta.data.urbandatacentre.ca
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Pinterest Fashion Compatibility [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    This dataset contains images (scenes) containing fashion products, which are labeled with bounding boxes and links to the corresponding products.

    Metadata includes

    • product IDs

    • bounding boxes

    Basic Statistics:

    • Scenes: 47,739

    • Products: 38,111

    • Scene-Product Pairs: 93,274

  8. P

    KaggleDBQA Dataset

    • paperswithcode.com
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chia-Hsuan Lee; Oleksandr Polozov; Matthew Richardson (2025). KaggleDBQA Dataset [Dataset]. https://paperswithcode.com/dataset/kaggledbqa
    Explore at:
    Dataset updated
    Jan 20, 2025
    Authors
    Chia-Hsuan Lee; Oleksandr Polozov; Matthew Richardson
    Description

    KaggleDBQA is a challenging cross-domain and complex evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions.

    It expands upon contemporary cross-domain text-to-SQL datasets in three key aspects: (1) Its databases are pulled from real-world data sources and not normalized. (2) Its questions are authored in environments that mimic natural question answering. (3) It also provides database documentation that contains rich in-domain knowledge.

  9. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  10. d

    Yellowstone Sample Collection - database

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Yellowstone Sample Collection - database [Dataset]. https://catalog.data.gov/dataset/yellowstone-sample-collection-database
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This database was prepared using a combination of materials that include aerial photographs, topographic maps (1:24,000 and 1:250,000), field notes, and a sample catalog. Our goal was to translate sample collection site locations at Yellowstone National Park and surrounding areas into a GIS database. This was achieved by transferring site locations from aerial photographs and topographic maps into layers in ArcMap. Each field site is located based on field notes describing where a sample was collected. Locations were marked on the photograph or topographic map by a pinhole or dot, respectively, with the corresponding station or site numbers. Station and site numbers were then referenced in the notes to determine the appropriate prefix for the station. Each point on the aerial photograph or topographic map was relocated on the screen in ArcMap, on a digital topographic map, or an aerial photograph. Several samples are present in the field notes and in the catalog but do not correspond to an aerial photograph or could not be found on the topographic maps. These samples are marked with “No” under the LocationFound field and do not have a corresponding point in the SampleSites feature class. Each point represents a field station or collection site with information that was entered into an attributes table (explained in detail in the entity and attribute metadata sections). Tabular information on hand samples, thin sections, and mineral separates were entered by hand. The Samples table includes everything transferred from the paper records and relates to the other tables using the SampleID and to the SampleSites feature class using the SampleSite field.

  11. d

    SAMPLE DATASET

    • staging-elsevier.digitalcommonsdata.com
    Updated Jul 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FirstName+36125284 LastName+36125284 (2019). SAMPLE DATASET [Dataset]. http://doi.org/10.1234/tgpfnk7zyt.19
    Explore at:
    Dataset updated
    Jul 10, 2019
    Authors
    FirstName+36125284 LastName+36125284
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.

  12. Student Performance Data Set

    • kaggle.com
    Updated Mar 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data-Science Sean (2020). Student Performance Data Set [Dataset]. https://www.kaggle.com/datasets/larsen0966/student-performance-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Data-Science Sean
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    If this Data Set is useful, and upvote is appreciated. This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was collected by using school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), while G1 and G2 correspond to the 1st and 2nd-period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details).

  13. A

    ‘🍐 FDIC Failed Bank List’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🍐 FDIC Failed Bank List’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fdic-failed-bank-list-3aaf/9b764d5e/?iid=004-707&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘🍐 FDIC Failed Bank List’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/fdic-failed-bank-liste on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    The FDIC is often appointed as receiver for failed banks. This list includes banks which have failed since October 1, 2000.

    Source: https://catalog.data.gov/dataset/fdic-failed-bank-list

    This dataset was created by Finance and contains around 500 samples along with Acquiring Institution, Bank Name, technical information and other features such as: - Updated Date - St - and more.

    How to use this dataset

    • Analyze Closing Date in relation to City
    • Study the influence of Acquiring Institution on Bank Name
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Finance

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  14. S

    A sample dataset of coastal land cover including mangroves in southern China...

    • scidb.cn
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhao Chuanpeng; Qin Chengzhi (2020). A sample dataset of coastal land cover including mangroves in southern China [Dataset]. http://doi.org/10.11922/sciencedb.00279
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 9, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Zhao Chuanpeng; Qin Chengzhi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Sample can drive classification algorithms, thus is a prerequisite for accurate classification. Coastal areas are located in the transitional zone between land and sea, requiring more samples to describe diverse land covers. However, there are scarce studies sharing their sample datasets, leading to a repeat of the time-consuming and laborious sampling procedure. To alleviate the problem, we share a sample set with a total of 16,444 sample points derived from a study of mapping mangroves of China. The sample set contains a total of 10 categories, which are described as follows. 1) The mangroves refer to “true mangroves” (excluding the associate mangrove species). In sampling mangroves, we used the data from the China Mangrove Conservation Network (CMCN, http://www.china-mangrove.org/), a non-governmental organization aiming to promote mangrove ecosystems. The CMCN provides an interactive map that can be annotated by volunteers with text or photos to record mangrove status at a location. Although the locations were shifted due to coordinate system differences and positioning errors, mangroves could be found around the mangrove locations depicted by the CMCN’s map on Google Earth images. There is a total of 1887 mangrove samples. 2) The cropland is dominated by paddy rice. We collected a total 1383 points according to its neat arrangement based on Google Earth images. 3) Coastal forests neighboring mangroves are mostly salt-tolerant, such as Cocos nucifera Linn., Hibiscus tiliaceus Linn., and Cerbera manghas Linn. We collected a total 1158 samples according to their distance to the shoreline based on Google Earth images. 4) Terrestrial forests are forests far from the shoreline, and are intolerant to salt. By visual inspection on Google Earth, we sampled 1269 points based on their appearances and distances to the shoreline. 5) For the grass category, we collected 1282 samples by visual judgement on Google Earth. 6) Saltmarsh, dominated by Spartina alterniflora, covering large areas of tidal flats in China. We collected 2065 samples according to Google Earth images. 7) The tidal flats category was represented by 1517 samples, which were sampled using the most recent global tidal flat map for 2014–2016 and were visually corrected. 8) The “sand or rock” category refers to sandy and pebble beaches or rocky coasts exposed to air, which are not habitats of mangroves. We collected 1622 samples on Google Earth based on visual inspection. 9) For the permanent water category, samples were first randomly sampled from a threshold result of NDWI (> 0.2), and then were visually corrected. A total of 2056 samples were obtained. 10) As to the artificial impervious surfaces category, we randomly sampled from a threshold result corresponding to normal difference built-up index (NDBI) (> 0.1), and corrected them based on Google Earth. The artificial impervious surface category was represented by 2205 samples. This sample dataset covers the low-altitude coastal area of five Provinces (Hainan, Guangdong, Fujian, Zhejiang, and Taiwan), one Autonomous region (Guangxi), and two Special Administrative Regions (Macau and Hong Kong) (see “study_area.shp” in the zip for details). It can be used to train models for coastal land cover classification, and to evaluate classification results. In addition to mangroves, it can also be used in identifying tidal flats, mapping salt marsh, extracting water bodies, and other related applications.Compared with the V1 version, we added a validation dataset for mangrove maps (Mangrove map validation dataset.rar), and thus can evaluate mangrove maps under the same dataset, which benefit the comparison of different mangrove maps. The validation dataset contains 10 shp files, in which each shp file contains 600 mangrove samples (cls_new field = 1) and 600 non-mangrove samples (cls_new field = 0).Compared with the V2 version, we added two classes of forest near water and grass near water, in addition to suppress the prevalent misclassified patches due to the spectral similarity between mangroves and those classes.

  15. d

    Dataset Sample - for User Testing

    • staging-elsevier.digitalcommonsdata.com
    Updated Jun 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patricia Sawamura (2020). Dataset Sample - for User Testing [Dataset]. http://doi.org/10.1234/785hy79b49.1
    Explore at:
    Dataset updated
    Jun 26, 2020
    Authors
    Patricia Sawamura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Here you can describe your dataset (3000 characters)

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

    Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur. Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur.

    At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat

  16. d

    Biological Samples Database (BSD)

    • catalog.data.gov
    • fisheries.noaa.gov
    Updated Jun 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Biological Samples Database (BSD) [Dataset]. https://catalog.data.gov/dataset/biological-samples-database-bsd
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Description

    The Biological Sampling Database (BSD) is an Oracle relational database that is maintained at the NMFS Panama City Laboratory and NOAA NMFS Beaufort Laboratory. Data set includes port samples of reef fish species collected from commercial and recreational fishery landings in the U.S. South Atlantic (NC - FL Keys). The data set serves as an inventory of samples stored at the NMFS Beaufort Laboratory as well as final processed data. Information that may be inlcuded for each sample is trip level information, species, size meansurements, age, sex and reproductive data.

  17. d

    Data Management Plan Examples Database

    • search.dataone.org
    • borealisdata.ca
    Updated Sep 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
    Explore at:
    Dataset updated
    Sep 4, 2024
    Dataset provided by
    Borealis
    Authors
    Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
    Time period covered
    Jan 1, 2011 - Jan 1, 2023
    Description

    This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.

  18. h

    dataset-card-example

    • huggingface.co
    Updated Sep 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Templates (2023). dataset-card-example [Dataset]. https://huggingface.co/datasets/templates/dataset-card-example
    Explore at:
    Dataset updated
    Sep 28, 2023
    Dataset authored and provided by
    Templates
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]… See the full description on the dataset page: https://huggingface.co/datasets/templates/dataset-card-example.
    
  19. P

    BANKING77 Dataset

    • paperswithcode.com
    • library.toponeai.link
    Updated Oct 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iñigo Casanueva; Tadas Temčinas; Daniela Gerz; Matthew Henderson; Ivan Vulić (2024). BANKING77 Dataset [Dataset]. https://paperswithcode.com/dataset/banking77
    Explore at:
    Dataset updated
    Oct 6, 2024
    Authors
    Iñigo Casanueva; Tadas Temčinas; Daniela Gerz; Matthew Henderson; Ivan Vulić
    Description

    Dataset composed of online banking queries annotated with their corresponding intents.

    BANKING77 dataset provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection.

  20. d

    SAMPLE DATASET - staging

    • staging-elsevier.digitalcommonsdata.com
    Updated Sep 30, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FirstName+36125284 LastName+36125284 (2019). SAMPLE DATASET - staging [Dataset]. http://doi.org/10.1234/tgpfnk7zyt.36
    Explore at:
    Dataset updated
    Sep 30, 2019
    Authors
    FirstName+36125284 LastName+36125284
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version. .. This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.

    This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version. This is the description of a dataset. The description can be quite long and this can look strange in the public dataset page. In the drafts page there is a scrollbar in the scrollbar, why not in the public page? Well, the public page needs to support viewing on a mobile phone and this can make scroll bars within scrollbars within scrollbars a little difficult. So maybe it’ll be better to try using ellipses. Additionally only adding a description does not make it a new version.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
SR (2021). Retail Credit Bank Data [Dataset]. https://www.kaggle.com/datasets/surekharamireddy/credit-data
Organization logo

Retail Credit Bank Data

Identifying risk in retail credit data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 10, 2021
Dataset provided by
Kaggle
Authors
SR
Description

Context

A retail bank would like to hire you to build a credit default model for their credit card portfolio. The bank expects the model to identify the consumers who are likely to default on their credit card payments over the next 12 months. This model will be used to reduce the bank’s future losses. The bank is willing to provide you with some sample datathat they can currently extract from their systems. This data set (credit_data.csv) consists of 13,444 observations with 14 variables.

Content

Based on the bank’s experience, the number of derogatory reports is a strong indicator of default. This is all that the information you are able to get from the bank at the moment. Currently, they do not have the expertise to provide any clarification on this data and are also unsure about other variables captured by their systems

Search
Clear search
Close search
Google apps
Main menu