25 datasets found
  1. f

    Details of feature variables of the data set.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  2. Churn Prediction

    • kaggle.com
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohamed ali salama (2022). Churn Prediction [Dataset]. https://www.kaggle.com/datasets/mohamedalisalama/churn-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mohamed ali salama
    Description

    Dataset

    This dataset was created by mohamed ali salama

    Contents

  3. f

    Confusion matrix.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Confusion matrix. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  4. f

    Comparison of GA-XGBoost with XGBoost and LightGBM test results.

    • figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison of GA-XGBoost with XGBoost and LightGBM test results. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of GA-XGBoost with XGBoost and LightGBM test results.

  5. f

    Data from: A Proposed Churn Prediction Model

    • figshare.com
    pdf
    Updated Feb 24, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr (2019). A Proposed Churn Prediction Model [Dataset]. http://doi.org/10.6084/m9.figshare.7763183.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 24, 2019
    Dataset provided by
    figshare
    Authors
    Mona Nasr; Essam Shaaban; Yehia Helmy; Dr. Ayman Khedr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Churn prediction aims to detect customers intended to leave a service provider. Retaining one customer costs an organization from 5 to 10 times than gaining a new one. Predictive models can provide correct identification of possible churners in the near future in order to provide a retention solution. This paper presents a new prediction model based on Data Mining (DM) techniques. The proposed model is composed of six steps which are; identify problem domain, data selection, investigate data set, classification, clustering and knowledge usage. A data set with 23 attributes and 5000 instances is used. 4000 instances used for training the model and 1000 instances used as a testing set. The predicted churners are clustered into 3 categories in case of using in a retention strategy. The data mining techniques used in this paper are Decision Tree, Support Vector Machine and Neural Network throughout an open source software name WEKA.

  6. w

    Global Logistic Regression Models Market Research Report: By Deployment Mode...

    • wiseguyreports.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wWiseguy Research Consultants Pvt Ltd (2024). Global Logistic Regression Models Market Research Report: By Deployment Mode (Cloud-based, On-premises), By Application (Fraud Detection, Risk Assessment, Predictive Analytics, Customer Churn Prediction, Medical Diagnosis), By Industry (Financial Services, Healthcare, Retail and eCommerce, Manufacturing, Transportation and Logistics), By Model Complexity (Simple Models, Complex Models, Deep Learning Models), By Data Type (Structured Data, Unstructured Data, Semi-structured Data) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2032. [Dataset]. https://www.wiseguyreports.com/reports/logistic-regression-models-market
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    wWiseguy Research Consultants Pvt Ltd
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Jan 7, 2024
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2024
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20235.01(USD Billion)
    MARKET SIZE 20245.64(USD Billion)
    MARKET SIZE 203214.52(USD Billion)
    SEGMENTS COVEREDDeployment Mode ,Application ,Industry ,Model Complexity ,Data Type ,Regional
    COUNTRIES COVEREDNorth America, Europe, APAC, South America, MEA
    KEY MARKET DYNAMICSCloudbased Deployment Integration of Machine Learning Big Data Analytics Increase in Demand for Predictive Analytics Rising Prevalence of Chronic Diseases
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDQlik Technologies ,Oracle ,Tableau Software ,Alteryx ,Teradata ,SAS Institute ,Dell Technologies ,KNIME ,H2O.ai ,DataRobot ,HP Enterprise ,SAP SE ,Microsoft ,IBM ,RapidMiner
    MARKET FORECAST PERIOD2025 - 2032
    KEY MARKET OPPORTUNITIES1 Expanding healthcare applications 2 Growing demand in pharmaceuticals 3 Rise of ecommerce and logistics 4 Increasing focus on predictive analytics 5 Advancements in machine learning algorithms
    COMPOUND ANNUAL GROWTH RATE (CAGR) 12.56% (2025 - 2032)
  7. P

    Predictive Analytics in Banking Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Predictive Analytics in Banking Report [Dataset]. https://www.datainsightsmarket.com/reports/predictive-analytics-in-banking-1448930
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Predictive analytics is rapidly transforming the banking sector, offering institutions the ability to enhance decision-making across various operations. The market, currently valued at approximately $15 billion in 2025, is projected to experience robust growth, driven by several key factors. Increasing regulatory scrutiny demanding improved risk management necessitates advanced analytical tools. The need for personalized customer experiences, coupled with the rising adoption of digital banking channels, fuels demand for predictive modeling in areas such as fraud detection, customer churn prediction, and targeted marketing. Furthermore, the availability of vast amounts of data, combined with advancements in machine learning and artificial intelligence, empowers banks to derive actionable insights with unprecedented accuracy. The market's expansion is further accelerated by the growing adoption of cloud-based solutions, offering scalability and cost-effectiveness. However, challenges remain. Data security and privacy concerns are paramount, requiring robust data governance frameworks. The need for skilled professionals to develop, implement, and interpret predictive models presents another hurdle. Additionally, the integration of predictive analytics solutions with existing legacy systems within banking institutions can prove complex and time-consuming. Despite these challenges, the long-term outlook for predictive analytics in banking remains positive, with a projected Compound Annual Growth Rate (CAGR) of approximately 15% from 2025 to 2033. This growth is anticipated to be driven by continuous technological innovation, increasing data availability, and the growing recognition of the substantial return on investment associated with predictive modeling within the financial industry. The competitive landscape includes established players like FICO, IBM, and Oracle, as well as specialized providers such as Accretive Technologies and Angoss Software, vying for market share through innovative solutions and strategic partnerships.

  8. B

    Big Data & Machine Learning in Telecom Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Big Data & Machine Learning in Telecom Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-machine-learning-in-telecom-57186
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data and Machine Learning (BDML) in Telecom market is experiencing robust growth, driven by the explosive increase in mobile data traffic, the rise of 5G networks, and the increasing need for personalized customer experiences. The market, valued at approximately $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching an estimated $60 billion by 2033. This expansion is fueled by several key factors. Telecom operators are leveraging BDML for network optimization, predictive maintenance, fraud detection, customer churn prediction, and personalized service offerings. The adoption of descriptive, predictive, and prescriptive analytics across various applications, including processing, storage, and analysis of vast datasets, is a significant driver. Furthermore, advancements in machine learning algorithms and feature engineering techniques are empowering telecom companies to extract deeper insights from their data, leading to significant efficiency gains and improved revenue streams. The increasing availability of cloud-based BDML solutions is also fostering wider adoption, particularly among smaller operators. However, challenges remain. Data security and privacy concerns, the need for skilled data scientists and engineers, and the high initial investment costs associated with implementing BDML solutions can hinder market growth. Despite these restraints, the strategic advantages offered by BDML are undeniable, making its adoption crucial for telecom companies aiming to stay competitive in a rapidly evolving landscape. Segments like predictive analytics and machine learning for network optimization are expected to experience the most significant growth during the forecast period, driven by the increasing complexity of telecom networks and the demand for proactive network management. Geographic regions such as North America and Asia Pacific, with their advanced technological infrastructure and substantial investments in 5G, are anticipated to lead the market, followed by Europe and other regions.

  9. o

    KDDCup09-Appetency

    • openml.org
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orange Labs (2020). KDDCup09-Appetency [Dataset]. https://www.openml.org/search?type=data&id=42774
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 18, 2020
    Authors
    Orange Labs
    Description

    This is the full version of the KDD Cup 2009 dataset

    This Year's Challenge

    Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn), buy new products or services (appetency), or buy upgrades or add-ons proposed to them to make the sale more profitable (up-selling).

    The most practical way, in a CRM system, to build knowledge on customer is to produce scores. A score (the output of a model) is an evaluation for all instances of a target variable to explain (i.e. churn, appetency or up-selling). Tools which produce scores allow to project, on a given population, quantifiable information. The score is computed using input variables which describe instances. Scores are then used by the information system (IS), for example, to personalize the customer relationship. An industrial customer analysis platform able to build prediction models with a very large number of input variables has been developed by Orange Labs. This platform implements several processing methods for instances and variables selection, prediction and indexation based on an efficient model combined with variable selection regularization and model averaging method. The main characteristic of this platform is its ability to scale on very large datasets with hundreds of thousands of instances and thousands of variables. The rapid and robust detection of the variables that have most contributed to the output prediction can be a key factor in a marketing application.

    The challenge is to beat the in-house system developed by Orange Labs. It is an opportunity to prove that you can deal with a very large database, including heterogeneous noisy data (numerical and categorical variables), and unbalanced class distributions. Time efficiency is often a crucial point. Therefore part of the competition will be time-constrained to test the ability of the participants to deliver solutions quickly.

    Task Description

    The task is to estimate the churn, appetency and up-selling probability of customers, hence there are three target values to be predicted. The challenge is staged in phases to test the rapidity with which each team is able to produce results. A large number of variables (15,000) is made available for prediction. However, to engage participants having access to less computing power, a smaller version of the dataset with only 230 variables will be made available in the second part of the challenge.

    • Churn (wikipedia definition): Churn rate is also sometimes called attrition rate. It is one of two primary factors that determine the steady-state level of customers a business will support. In its broadest sense, churn rate is a measure of the number of individuals or items moving into or out of a collection over a specific period of time. The term is used in many contexts, but is most widely applied in business with respect to a contractual customer base. For instance, it is an important factor for any business with a subscriber-based service model, including mobile telephone networks and pay TV operators. The term is also used to refer to participant turnover in peer-to-peer networks.

    • Appetency: In our context, the appetency is the propensity to buy a service or a product.

    • Up-selling (wikipedia definition): Up-selling is a sales technique whereby a salesman attempts to have the customer purchase more expensive items, upgrades, or other add-ons in an attempt to make a more profitable sale. Up-selling usually involves marketing more profitable services or products, but up-selling can also be simply exposing the customer to other options he or she may not have considered previously. Up-selling can imply selling something additional, or selling something that is more profitable or otherwise preferable for the seller instead of the original sale.

    The training set contains 50,000 examples. The first predictive 14,740 variables are numerical and the last 260 predictive variables are categorical. The last target variable is binary (-1,1).

  10. f

    Comparison results of different model.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Comparison results of different model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  11. B

    Big Data in Telecom Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Big Data in Telecom Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-in-telecom-462791
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jul 13, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Big Data in Telecom market is experiencing robust growth, driven by the exponential increase in mobile data traffic, the proliferation of IoT devices, and the rising demand for personalized customer experiences. The market, estimated at $50 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $150 billion by 2033. This expansion is fueled by the need for telecom operators to leverage big data analytics for network optimization, fraud detection, customer churn prediction, and the development of innovative value-added services. Key trends include the increasing adoption of cloud-based big data solutions, the rise of AI and machine learning for data analysis, and the growing importance of data security and privacy. Leading technology providers such as Accenture, Amazon, Cisco, IBM, Microsoft, and Oracle are actively investing in developing advanced big data solutions tailored to the telecom industry. The market is segmented by deployment type (on-premise, cloud), data type (structured, unstructured), application (network optimization, customer relationship management, security), and region. While the market faces restraints such as high implementation costs and the need for skilled data scientists, the overall outlook remains highly positive. The competitive landscape is characterized by a mix of established technology vendors and specialized telecom solutions providers. Companies like Accenture, Amazon, and IBM offer comprehensive big data platforms and consulting services, while others focus on specific niche areas within the telecom sector. The Asia-Pacific region is expected to witness the highest growth rate due to increasing smartphone penetration and rapid digitalization. However, North America and Europe continue to hold significant market shares due to the early adoption of big data technologies and the presence of mature telecom infrastructure. Future growth will depend on factors such as 5G network rollout, the evolution of edge computing, and the continued development of advanced analytics capabilities. The successful implementation of big data strategies will be crucial for telecom operators to maintain competitiveness and enhance operational efficiency in an increasingly data-driven environment.

  12. Credit Card Fraud Dataset

    • kaggle.com
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishal Painjane (2025). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/vishalpainjane/dataset101
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishal Painjane
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit risk assessment remains a critical function within financial services, influencing lending decisions, portfolio risk management, and regulatory compliance. It integrates multiple categories of financial, transactional, and behavioral data to enable advanced machine learning applications in the domain of financial risk modeling.

    Data Composition and Structure

    The dataset comprises a total of 1,212 distinct features, systematically grouped into four principal categories, alongside a binary target variable. Each feature category represents a specific dimension of credit risk assessment, reflecting both internal transactional data and externally sourced credit bureau information.

    Target Variable

    The dependent variable, denoted as bad_flag, represents a binary risk classification outcome associated with each customer account. The variable takes the following values:

    • 0: Denotes a low-risk, creditworthy customer
    • 1: Denotes a high-risk, default-prone customer

    This variable serves as the target for binary classification models aimed at predicting credit risk propensity.

    Feature Groups

    CategoryNumber of FeaturesDescription
    Transaction Attributes664Customer-level transaction behavior, repayment patterns, financial habits
    Bureau Credit Data452Credit scores, external bureau records, delinquency flags, historical credit data
    Bureau Enquiries50Credit inquiry history, frequency and type of external credit applications
    ONUS Attributes48Internal bank relationship metrics, account engagement indicators

    Each feature within a category follows a systematic sequential naming convention (e.g., transaction_attribute_1, bureau_1), facilitating programmatic identification and group-level analysis.

    Data Characteristics

    The dataset exhibits several characteristics that mirror operational credit risk data environments:

    • High Dimensionality: The feature space exceeds 1,200 variables
    • Mixed Data Types: Numerical values (continuous and discrete), binary indicators
    • High Sparsity: A substantial proportion of features contain zero values or missing entries
    • Value Range Disparity: Feature values exhibit significant variance, with magnitudes ranging from small ratios (0.001) to large transaction amounts (288,500)

    Methodological Rationale

    The dataset was constructed by simulating data generation processes typical within financial services institutions. Transactional behaviors, bureau records, and inquiry histories were aggregated and engineered into derivative features.

  13. A

    AI In Telecommunication Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). AI In Telecommunication Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-in-telecommunication-361820
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI in Telecommunications market is experiencing explosive growth, projected to reach $1772.9 million in 2025 and exhibiting a remarkable Compound Annual Growth Rate (CAGR) of 38.9% from 2019 to 2033. This surge is driven by the increasing need for network optimization, enhanced security measures, and sophisticated customer analytics within the telecommunications sector. The adoption of AI-powered solutions enables telecom providers to improve network efficiency, reduce operational costs, personalize customer experiences, and proactively address potential network issues. Key applications driving this growth include network optimization (predictive maintenance, resource allocation), network security (fraud detection, intrusion prevention), and customer analytics (churn prediction, personalized offers). The market is segmented by solutions (software, hardware) and services (consulting, implementation, support), reflecting the diverse needs of telecom companies. Major players like IBM, Microsoft, Google, and Cisco Systems are actively investing in and developing AI-powered solutions for this market, fueling competition and innovation. The geographic distribution reveals strong growth across North America and Europe, although the Asia-Pacific region shows immense potential for future expansion, driven by increasing digitalization and investments in advanced telecommunications infrastructure. The robust CAGR underscores the transformative power of AI in reshaping the telecommunications landscape. Continued advancements in AI algorithms and increasing data availability are expected to further propel market expansion throughout the forecast period. The competitive landscape is characterized by a blend of established technology giants and specialized AI companies. This dynamic mix fosters innovation and competition, leading to the development of sophisticated and increasingly affordable AI-powered solutions. While challenges such as data privacy concerns and the need for skilled professionals exist, the overall market trajectory remains strongly positive. The significant investments from major players and the clear business benefits of AI in telecom suggest that this growth trajectory will likely persist, potentially exceeding even the current projections. Furthermore, the integration of AI with emerging technologies like 5G and edge computing is poised to further unlock new opportunities and accelerate the market's expansion.

  14. P

    Predictive Analytics Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Predictive Analytics Report [Dataset]. https://www.datainsightsmarket.com/reports/predictive-analytics-1436910
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 25, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The predictive analytics market, currently valued at $6498.2 million in 2025, is experiencing robust growth, projected to expand significantly over the forecast period (2025-2033) at a Compound Annual Growth Rate (CAGR) of 12.5%. This rapid expansion is driven by several key factors. The increasing availability of large datasets, coupled with advancements in machine learning and artificial intelligence, is enabling businesses across various sectors to leverage predictive analytics for enhanced decision-making. Furthermore, the growing need for improved operational efficiency, risk management, and customer experience is fueling the demand for sophisticated predictive modeling solutions. The adoption of cloud-based predictive analytics platforms is also accelerating market growth, offering scalability and cost-effectiveness compared to traditional on-premise solutions. Major players like IBM, Oracle, SAP, Microsoft, and SAS Institute are actively contributing to market expansion through continuous innovation and strategic partnerships. The market segmentation, while not explicitly provided, can be reasonably inferred to include industry verticals like healthcare, finance, retail, and manufacturing. Within these sectors, predictive analytics is applied to diverse use cases, such as fraud detection, customer churn prediction, supply chain optimization, and personalized medicine. While challenges exist, such as data security concerns and the need for skilled professionals, the overall market outlook remains extremely positive, indicating substantial growth opportunities for both established players and emerging companies in the predictive analytics space. The competitive landscape is dynamic, with established vendors continuously innovating and newer entrants leveraging niche technologies to gain market share. Continued advancements in algorithms and the increasing accessibility of advanced analytics tools will further propel market expansion in the coming years.

  15. Global Telecom Analytics Market By Application (Sales and Marketing...

    • verifiedmarketresearch.com
    Updated Sep 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Telecom Analytics Market By Application (Sales and Marketing Management, Risk and Compliance Management, Network Management, Customer Management), Component (Software, Services), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/global-telecom-analytics-market-size-and-forecast/
    Explore at:
    Dataset updated
    Sep 15, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Telecom Analytics Market size was valued at USD 5.06 Billion in 2024 and is projected to reach USD 14.64 Billion by 2031, growing at a CAGR of 14.20% from 2024 to 2031.

    The telecom analytics market is driven by the growing demand for data-driven insights to enhance customer experience, optimize network performance, and improve operational efficiency in an increasingly competitive telecom landscape. The surge in mobile data usage, fueled by the proliferation of smartphones and high-speed internet, has created vast amounts of data, prompting telecom operators to adopt advanced analytics solutions. Telecom analytics help in fraud detection, churn prediction, and revenue assurance, enabling companies to make more informed decisions. The integration of AI, machine learning, and big data technologies further enhances the capabilities of analytics tools, allowing for real-time decision-making and predictive analysis. Additionally, regulatory requirements for compliance and the increasing need to monetize network infrastructure drive the adoption of telecom analytics solutions. The shift toward 5G and IoT also presents new opportunities for telecom analytics in managing complex and data-intensive networks.

  16. L

    Logistic Regression Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Logistic Regression Software Report [Dataset]. https://www.datainsightsmarket.com/reports/logistic-regression-software-1402414
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global logistic regression software market is experiencing robust growth, driven by the increasing adoption of advanced analytics and machine learning across diverse sectors. The market, estimated at $2.5 billion in 2025, is projected to exhibit a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated value exceeding $7 billion by 2033. This expansion is fueled by several key factors. Firstly, the rising need for predictive modeling in industries like healthcare (predicting patient risk), finance (fraud detection), and marketing (customer churn prediction) is significantly boosting demand. Secondly, the proliferation of large datasets and the growing availability of cloud-based logistic regression tools are lowering the barrier to entry for businesses of all sizes. Finally, ongoing advancements in the software itself, including the development of more sophisticated algorithms and user-friendly interfaces, are further driving market growth. The market is segmented by application (Manufacturing, Healthcare, Finance, Marketing, Others) and by type of logistic regression (Binary, Multinomial, Ordinal), each exhibiting unique growth trajectories reflecting specific industry needs. While data privacy concerns and the complexity of implementing and interpreting logistic regression models pose some challenges, the overall market outlook remains positive, indicating substantial opportunities for software vendors and technology providers. The competitive landscape is characterized by a mix of established players like IBM and AWS, alongside specialized firms like Lumivero and RegressIt, and smaller niche players focusing on specific applications, such as AAT Bioquest in healthcare. Geographic distribution of market share shows North America currently dominating, followed by Europe and Asia Pacific. However, emerging economies in Asia Pacific are expected to witness significant growth in the forecast period, driven by increasing digitalization and adoption of advanced analytical techniques. The continued development of integrated platforms combining logistic regression with other analytical tools, along with increased focus on user training and support, will be crucial for sustaining market momentum and broadening adoption across various user segments.

  17. f

    Performance comparison of different adoption algorithms in XGBoost model.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). Performance comparison of different adoption algorithms in XGBoost model. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison of different adoption algorithms in XGBoost model.

  18. f

    The summary of the literature review.

    • plos.figshare.com
    xls
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ke Peng; Yan Peng; Wenguang Li (2023). The summary of the literature review. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ke Peng; Yan Peng; Wenguang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

  19. B

    Big Data for Telecommunications and Media & Entertainment Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Big Data for Telecommunications and Media & Entertainment Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-for-telecommunications-and-media-entertainment-56328
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data market for Telecommunications and Media & Entertainment is experiencing robust growth, driven by the increasing volume of data generated by these sectors and the need for advanced analytics to extract valuable insights. The market, currently estimated at $50 billion in 2025, is projected to experience a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033. This growth is fueled by several key factors. Firstly, the proliferation of connected devices and the rise of 5G networks are generating an unprecedented amount of data that needs to be stored, processed, and analyzed. Secondly, the need for personalized content and targeted advertising in the media & entertainment industry is driving demand for sophisticated analytics solutions that leverage big data. Thirdly, the telecommunications industry is utilizing big data for network optimization, fraud detection, and customer churn prediction, leading to significant operational efficiencies and improved customer experience. However, challenges remain, including data security concerns, the complexity of implementing big data solutions, and the need for skilled professionals to manage and analyze the vast datasets. Despite these challenges, the market’s growth trajectory is expected to remain positive, driven by continued technological advancements and the ever-increasing reliance on data-driven decision-making within these sectors. The segment analysis reveals strong growth across both software and hardware components, with software solutions leading the charge due to their adaptability and scalability. Deployment models are shifting towards cloud-based solutions, offering improved cost efficiency and accessibility. While North America and Europe currently hold the largest market share, rapid adoption in Asia Pacific, particularly in countries like China and India, is expected to fuel substantial regional growth in the coming years. Leading technology providers like Microsoft, Google, AWS, and others are actively investing in developing and deploying innovative big data solutions tailored to the specific needs of the telecommunications and media & entertainment industries. The competitive landscape is highly dynamic, characterized by both established players and emerging startups vying for market share through technological innovation and strategic partnerships. The continued expansion of data volume and the demand for advanced analytics ensures a robust outlook for this market through 2033.

  20. Big Data And Analytics Market In Telecom Industry Analysis, Size, and...

    • technavio.com
    Updated Mar 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Big Data And Analytics Market In Telecom Industry Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan, South Korea), Middle East and Africa , and South America (Brazil) [Dataset]. https://www.technavio.com/report/big-data-and-analytics-in-telecom-industry-market-analysis
    Explore at:
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Global, United States
    Description

    Snapshot img

    Big Data And Analytics Market In Telecom Industry Size 2025-2029

    The big data and analytics market in telecom industry size is forecast to increase by USD 9.03 billion, at a CAGR of 14.7% between 2024 and 2029.

    The Big Data and Analytics market in the Telecom industry is experiencing significant growth, driven primarily by the surge in data volumes generated by an increasing number of connected devices and the adoption of 5G technology. Telecom companies are capitalizing on this trend by introducing new data analytics solutions to gain insights from the vast amounts of data they collect. However, this growth comes with challenges. Data privacy and regulatory compliance are becoming increasingly important, with stricter regulations being implemented to protect customer data. Telecom companies must invest in robust data security measures and ensure they are in compliance with these regulations to maintain customer trust and avoid costly fines. Additionally, the complexity of managing and analyzing large data sets can be a challenge, requiring significant IT resources and expertise. To remain competitive, telecom companies must effectively navigate these challenges and continue to innovate in the realm of data analytics to provide value-added services to their customers.

    What will be the Size of the Big Data And Analytics Market In Telecom Industry during the forecast period?

    Request Free SampleIn the telecom industry, big data and analytics continue to play a pivotal role in driving innovation and enhancing network performance. The application of advanced technologies such as cloud computing, artificial intelligence, network forensics, and sentiment analysis, among others, is transforming the way telecom infrastructure is managed and optimized. Network dynamics are constantly evolving, with new challenges and opportunities arising in areas like network availability, data transformation, customer relationship management, and network security. Telecom companies are leveraging data integration, network modeling, and data cleansing to gain insights into network behavior and customer preferences. Satellite communications, wireless networks, and fiber optic networks are being optimized using network optimization algorithms and predictive analytics to improve network reliability and performance. Telecom network optimization is also a key focus area, with 5G network analytics and network virtualization gaining traction. Data privacy, fraud detection, and compliance regulations are critical concerns for telecom companies, and data security is a top priority. Machine learning algorithms and network security analytics are being used to enhance network intrusion detection and prevent data breaches. Customer segmentation and targeted marketing are other areas where big data and analytics are making a significant impact. Real-time analytics and data visualization tools are enabling telecom companies to gain actionable insights and make data-driven decisions. Telecom infrastructure is being transformed through big data and analytics, with network management systems and network orchestration playing a crucial role in ensuring seamless integration and optimization of various network components. The ongoing unfolding of market activities and evolving patterns in the telecom industry underscore the importance of staying abreast of the latest trends and technologies.

    How is this Big Data And Analytics In Telecom Industry Industry segmented?

    The big data and analytics in telecom industry industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ComponentHardwareServicesSoftwareApplicationNetwork optimizationCEEFD and POperational efficiencyRevenue assuranceAnalytics TypeCustomer AnalyticsNetwork AnalyticsMarketing AnalyticsDeployment ModelCloud-BasedOn-PremisesGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKAPACChinaIndiaJapanSouth KoreaSouth AmericaBrazilRest of World (ROW)

    By Component Insights

    The hardware segment is estimated to witness significant growth during the forecast period.In the telecom industry, the integration of cloud computing and artificial intelligence (AI) is revolutionizing big data and analytics. Telecom companies leverage AI for network forensics, sentiment analysis, fraud detection, customer churn prediction, and network optimization. Network modeling utilizes satellite communications and wireless networks to analyze customer behavior and optimize network performance. Data integration is crucial for merging data from various sources, ensuring data transformation and data quality assurance. 5G network analytics necessitates robust data processing capabilities. Telecom companies invest in big data infrastructure, including network optimization algorithms, data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ke Peng; Yan Peng; Wenguang Li (2023). Details of feature variables of the data set. [Dataset]. http://doi.org/10.1371/journal.pone.0289724.t002

Details of feature variables of the data set.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Dec 8, 2023
Dataset provided by
PLOS ONE
Authors
Ke Peng; Yan Peng; Wenguang Li
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.

Search
Clear search
Close search
Google apps
Main menu