33 datasets found
  1. Customer_Financial_Data

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashob Narendran (2025). Customer_Financial_Data [Dataset]. https://www.kaggle.com/datasets/prashobnarendran/customer-financial-data
    Explore at:
    zip(62099 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    Prashob Narendran
    Description

    Context This dataset contains detailed, anonymized information about a bank's customers. It includes demographic data such as age, income, and family size, as well as financial information like mortgage value, credit card ownership, and average spending habits. The data is well-suited for a variety of machine learning tasks, particularly in the domain of financial services and marketing.

    Content The dataset consists of 5000 customer records with 14 attributes:

    • Customer_ID: A unique identifier for each customer.
    • Age: The customer's age in completed years.
    • Years_Experience: Years of professional experience.
    • Annual_Income: Annual income of the customer (in thousands of dollars).
    • ZIP_Code: The customer's home address ZIP code.
    • Family_size: The number of individuals in the customer's family.
    • Avg_Spending: Average monthly spending on credit cards (in thousands of dollars).
    • Education_Level: A categorical variable for education level (1: Undergraduate, 2: Graduate, 3: Advanced/Professional).
    • Mortgage: The value of the customer's house mortgage if any (in thousands of dollars).
    • Has_Consumer_Loan: Binary variable indicating if the customer accepted a personal loan in the last campaign (1: Yes, 0: No). This is a potential target variable.
    • Has_Securities_Account: Binary variable indicating if the customer has a securities account with the bank.
    • Has_CD_Account: Binary variable indicating if the customer has a certificate of deposit (CD) account with the bank.
    • Uses_Online_Banking: Binary variable indicating if the customer uses online banking services.
    • Has_CreditCard: Binary variable indicating if the customer uses a credit card issued by this bank.

    Data Quality Note Some rows contain negative values for the Years_Experience column. This is a data quality issue that may require preprocessing (e.g., imputation by taking the absolute value or using the average of similar age groups).

    Potential Use Cases This dataset is excellent for both educational and practical purposes. You can use it to:

    1. Predict Loan Acceptance: Build a classification model to predict which customers are most likely to accept a personal loan (Has_Consumer_Loan).
    2. Customer Segmentation: Use clustering algorithms (like K-Means) to identify distinct customer segments for targeted marketing campaigns.
    3. Credit Card Adoption: Analyze the factors that influence a customer's decision to get a bank-issued credit card.
    4. Exploratory Data Analysis (EDA): Practice your data analysis and visualization skills to uncover insights about customer behavior.
  2. Predicting Credit Card Customer Segmentation

    • kaggle.com
    zip
    Updated Mar 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2024). Predicting Credit Card Customer Segmentation [Dataset]. https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m/code
    Explore at:
    zip(387771 bytes)Available download formats
    Dataset updated
    Mar 10, 2024
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predicting Credit Card Customer Segmentation

    Exploring Key Customer Characteristics

    By [source]

    About this dataset

    This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.

    Research Ideas

    • Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.
    • Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.
    • Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...

  3. d

    GIS Data | USA & Canada | Over 40k Demographics Variables To Inform Business...

    • datarade.ai
    .json, .csv
    Updated Aug 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GapMaps (2024). GIS Data | USA & Canada | Over 40k Demographics Variables To Inform Business Decisions | Consumer Spending Data| Demographic Data [Dataset]. https://datarade.ai/data-products/gapmaps-premium-demographic-data-by-ags-usa-canada-gis-gapmaps
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 13, 2024
    Dataset authored and provided by
    GapMaps
    Area covered
    Canada, United States
    Description

    GapMaps GIS data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.

    GIS Data attributes include:

    1. Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.

    2. Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.

    3. Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.

    4. Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.

    5. Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.

    6. Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.

    7. Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.

    8. Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain

    Primary Use Cases for GapMaps GIS Data:

    1. Retail (eg. Fast Food/ QSR, Cafe, Fitness, Supermarket/Grocery)
    2. Customer Profiling: get a detailed understanding of the demographic & segmentation profile of your customers, where they work and their spending potential
    3. Analyse your trade areas at a granular census block level using all the key metrics
    4. Site Selection: Identify optimal locations for future expansion and benchmark performance across existing locations.
    5. Target Marketing: Develop effective marketing strategies to acquire more customers.
    6. Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.

    7. Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)

    8. Network Planning

    9. Customer (Risk) Profiling for insurance/loan approvals

    10. Target Marketing

    11. Competitive Analysis

    12. Market Optimization

    13. Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)

    14. Tenant Recruitment

    15. Target Marketing

    16. Market Potential / Gap Analysis

    17. Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)

    18. Customer Profiling

    19. Target Marketing

    20. Market Share Analysis

  4. Z

    Data from: Customer Segmentation in the Digital Marketing Using a Q-Learning...

    • data-staging.niaid.nih.gov
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Guanqun (2025). Customer Segmentation in the Digital Marketing Using a Q-Learning Based Differential Evolution Algorithm Integrated with K-means clustering [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14614252
    Explore at:
    Dataset updated
    Jan 8, 2025
    Authors
    Wang, Guanqun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was collected from Kaggle. It includes various features related to customer demographics, purchasing behavior, and other relevant metrics.

  5. Customer Segmentation Data for Marketing Analysis

    • kaggle.com
    zip
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahmida (2024). Customer Segmentation Data for Marketing Analysis [Dataset]. https://www.kaggle.com/datasets/fahmidachowdhury/customer-segmentation-data-for-marketing-analysis/code
    Explore at:
    zip(16744 bytes)Available download formats
    Dataset updated
    Jun 28, 2024
    Authors
    Fahmida
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains simulated customer data that can be used for segmentation analysis. It includes demographic and behavioral information about customers, which can help in identifying distinct segments within the customer base. This can be particularly useful for targeted marketing strategies, improving customer satisfaction, and increasing sales.

    Columns: id: Unique identifier for each customer. age: Age of the customer. gender: Gender of the customer (Male, Female, Other). income: Annual income of the customer (in USD). spending_score: Spending score (1-100), indicating the customer's spending behavior and loyalty. membership_years: Number of years the customer has been a member. purchase_frequency: Number of purchases made by the customer in the last year. preferred_category: Preferred shopping category (Electronics, Clothing, Groceries, Home & Garden, Sports). last_purchase_amount: Amount spent by the customer on their last purchase (in USD). Potential Uses: Customer Segmentation: Identify different customer segments based on their demographic and behavioral characteristics. Targeted Marketing: Develop targeted marketing strategies for different customer segments. Customer Loyalty Programs: Design loyalty programs based on customer spending behavior and preferences. Sales Analysis: Analyze sales patterns and predict future trends.

  6. d

    Demographic Data | USA & Canada | Latest Estimates & Projections To Inform...

    • datarade.ai
    .json, .csv
    Updated Jun 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GapMaps (2024). Demographic Data | USA & Canada | Latest Estimates & Projections To Inform Business Decisions | GIS Data | Map Data [Dataset]. https://datarade.ai/data-products/gapmaps-ags-usa-demographics-data-40k-variables-trusted-gapmaps
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jun 24, 2024
    Dataset authored and provided by
    GapMaps
    Area covered
    Canada, United States
    Description

    GapMaps premium demographic data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.

    Demographic Data attributes include:

    Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.

    Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.

    Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.

    Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.

    Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.

    Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.

    Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.

    Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain

    Primary Use Cases for AGS Demographic Data:

    1. Retail (eg. Fast Food/ QSR, Cafe, Fitness, Supermarket/Grocery)
    2. Customer Profiling: get a detailed understanding of the demographic & segmentation profile of your customers, where they work and their spending potential
    3. Analyse your trade areas at a granular census block level using all the key metrics
    4. Site Selection: Identify optimal locations for future expansion and benchmark performance across existing locations.
    5. Target Marketing: Develop effective marketing strategies to acquire more customers.
    6. Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.

    7. Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)

    8. Network Planning

    9. Customer (Risk) Profiling for insurance/loan approvals

    10. Target Marketing

    11. Competitive Analysis

    12. Market Optimization

    13. Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)

    14. Tenant Recruitment

    15. Target Marketing

    16. Market Potential / Gap Analysis

    17. Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)

    18. Customer Profiling

    19. Target Marketing

    20. Market Share Analysis

  7. d

    US Consumer Demographics | Homeowners & Renters | Email & Mobile Phone |...

    • datarade.ai
    .json, .csv, .xls
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompCurve (2024). US Consumer Demographics | Homeowners & Renters | Email & Mobile Phone | Bulk & Custom | 255M People [Dataset]. https://datarade.ai/data-products/compcurve-us-consumer-demographics-homeowners-renters-compcurve
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 18, 2024
    Dataset authored and provided by
    CompCurve
    Area covered
    United States
    Description

    Knowing who your consumers are is essential for businesses, marketers, and researchers. This detailed demographic file offers an in-depth look at American consumers, packed with insights about personal details, household information, financial status, and lifestyle choices. Let's take a closer look at the data:

    Personal Identifiers and Basic Demographics At the heart of this dataset are the key details that make up a consumer profile:

    Unique IDs (PID, HHID) for individuals and households Full names (First, Middle, Last) and suffixes Gender and age Date of birth Complete location details (address, city, state, ZIP) These identifiers are critical for accurate marketing and form the base for deeper analysis.

    Geospatial Intelligence This file goes beyond just listing addresses by including rich geospatial data like:

    Latitude and longitude Census tract and block details Codes for Metropolitan Statistical Areas (MSA) and Core-Based Statistical Areas (CBSA) County size codes Geocoding accuracy This allows for precise geographic segmentation and localized marketing.

    Housing and Property Data The dataset covers a lot of ground when it comes to housing, providing valuable insights for real estate professionals, lenders, and home service providers:

    Homeownership status Dwelling type (single-family, multi-family, etc.) Property values (market, assessed, and appraised) Year built and square footage Room count, amenities like fireplaces or pools, and building quality This data is crucial for targeting homeowners with products and services like refinancing or home improvement offers.

    Wealth and Financial Data For a deeper dive into consumer wealth, the file includes:

    Estimated household income Wealth scores Credit card usage Mortgage info (loan amounts, rates, terms) Home equity estimates and investment property ownership These indicators are invaluable for financial services, luxury brands, and fundraising organizations looking to reach affluent individuals.

    Lifestyle and Interests One of the most useful features of the dataset is its extensive lifestyle segmentation:

    Hobbies and interests (e.g., gardening, travel, sports) Book preferences, magazine subscriptions Outdoor activities (camping, fishing, hunting) Pet ownership, tech usage, political views, and religious affiliations This data is perfect for crafting personalized marketing campaigns and developing products that align with specific consumer preferences.

    Consumer Behavior and Purchase Habits The file also sheds light on how consumers behave and shop:

    Online and catalog shopping preferences Gift-giving tendencies, presence of children, vehicle ownership Media consumption (TV, radio, internet) Retailers and e-commerce businesses will find this behavioral data especially useful for tailoring their outreach.

    Demographic Clusters and Segmentation Pre-built segments like:

    Household, neighborhood, family, and digital clusters Generational and lifestage groups make it easier to quickly target specific demographics, streamlining the process for market analysis and campaign planning.

    Ethnicity and Language Preferences In today's multicultural market, knowing your audience's cultural background is key. The file includes:

    Ethnicity codes and language preferences Flags for Hispanic/Spanish-speaking households This helps ensure culturally relevant and sensitive communication.

    Education and Occupation Data The dataset also tracks education and career info:

    Education level and occupation codes Home-based business indicators This data is essential for B2B marketers, recruitment agencies, and education-focused campaigns.

    Digital and Social Media Habits With everyone online, digital behavior insights are a must:

    Internet, TV, radio, and magazine usage Social media platform engagement (Facebook, Instagram, LinkedIn) Streaming subscriptions (Netflix, Hulu) This data helps marketers, app developers, and social media managers connect with their audience in the digital space.

    Political and Charitable Tendencies For political campaigns or non-profits, this dataset offers:

    Political affiliations and outlook Charitable donation history Volunteer activities These insights are perfect for cause-related marketing and targeted political outreach.

    Neighborhood Characteristics By incorporating census data, the file provides a bigger picture of the consumer's environment:

    Population density, racial composition, and age distribution Housing occupancy and ownership rates This offers important context for understanding the demographic landscape.

    Predictive Consumer Indexes The dataset includes forward-looking indicators in categories like:

    Fashion, automotive, and beauty products Health, home decor, pet products, sports, and travel These predictive insights help businesses anticipate consumer trends and needs.

    Contact Information Finally, the file includes key communication details:

    Multiple phone numbers (landline, mobile) and email addresses Do Not Call (DNC) flags...

  8. Customer Segmentation for Targeted Campaigns

    • kaggle.com
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mani Devesh (2024). Customer Segmentation for Targeted Campaigns [Dataset]. https://www.kaggle.com/datasets/manidevesh/customer-sales-data
    Explore at:
    zip(914292 bytes)Available download formats
    Dataset updated
    May 21, 2024
    Authors
    Mani Devesh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Project Overview: Customer Segmentation Using K-Means Clustering

    Introduction In this project, I analysed customer data from a retail store to identify distinct customer segments. The dataset includes key attributes such as age, city, and total sales of the customers. By leveraging K-Means clustering, an unsupervised machine learning technique, I aim to group customers based on their age and sales metrics. These insights will enable the creation of targeted marketing campaigns tailored to the specific needs and behaviours of each customer segment.

    Objectives - Cluster Customers: Use K-Means clustering to group customers based on age and total sales. - Analyse Segments: Examine the characteristics of each customer segment. - Targeted Marketing: Develop strategies for personalized marketing campaigns targeting each identified customer group.

    Data Description The dataset comprises:

    • Age: The age of the customers.
    • City: The city where the customers reside.
    • Total Sales: The total sales generated by each customer.

    Methodology - Data Preprocessing: Clean and preprocess the data to handle any missing or inconsistent entries. - Feature Selection: Focus on age and total sales as primary features for clustering. - K-Means Clustering: Apply the K-Means algorithm to identify distinct customer segments. - Cluster Analysis: Analyse the resulting clusters to understand the demographic and sales characteristics of each group. - Marketing Strategy Development: Create targeted marketing strategies for each customer segment to enhance engagement and sales.

    Expected Outcomes - Customer Segments: Clear identification of customer groups based on age and purchasing behaviour. - Insights for Marketing: Detailed understanding of each segment to inform targeted marketing efforts. - Business Impact: Enhanced ability to tailor marketing campaigns, potentially leading to increased customer satisfaction and sales.

    By clustering customers based on age and total sales, this project aims to provide actionable insights for personalized marketing, ultimately driving better customer engagement and higher sales for the retail store.

  9. m

    Dataset on Brand Ethics, Trust, Customer Experience, and Loyalty in Latin...

    • data.mendeley.com
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathalie Peña García (2025). Dataset on Brand Ethics, Trust, Customer Experience, and Loyalty in Latin American Consumers: The Alpina® Case [Dataset]. http://doi.org/10.17632/5hfw5ntfck.1
    Explore at:
    Dataset updated
    Oct 10, 2025
    Authors
    Nathalie Peña García
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Latin America
    Description

    This dataset contains the full responses from a structured survey conducted in Colombia (2024) aimed at analyzing the relationships between perceived brand ethics, trust, service quality, customer experience, perceived value, brand engagement, and loyalty. The study includes socio-demographic segmentation by educational level and focuses on the consumer perception of Alpina®, a leading brand in the Latin American food industry.

  10. c

    Consumer Behavior and Shopping Habits Dataset:

    • cubig.ai
    zip
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Consumer Behavior and Shopping Habits Dataset: [Dataset]. https://cubig.ai/store/products/352/consumer-behavior-and-shopping-habits-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Consumer Behavior and Shopping Habits Dataset is a tabular collection of customer demographics, purchase history, product preferences, shopping frequency, and online and offline purchasing behavior.

    2) Data Utilization (1) Consumer Behavior and Shopping Habits Dataset has characteristics that: • Each row contains detailed consumer and transaction information such as customer ID, age, gender, purchased goods and categories, purchase amount, region, product attributes (size, color, season), review rating, subscription status, delivery method, discount/promotion usage, payment method, purchase frequency, etc. • Data is organized to cover a variety of variables and purchasing patterns to help segment customers, establish marketing strategies, analyze product preferences, and more. (2) Consumer Behavior and Shopping Habits Dataset can be used to: • Customer Segmentation and Target Marketing: You can analyze demographics and purchasing patterns to define different customer groups and use them to develop customized marketing strategies. • Product and service improvement: Based on purchase history, review ratings, discount/promotional responses, etc., it can be applied to product and service improvements such as identifying popular products, managing inventory, and analyzing promotion effects.

  11. facebook fact checking dataset

    • figshare.com
    csv
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mehdi khalil (2024). facebook fact checking dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27645690.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    mehdi khalil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.

  12. c

    Curated, Segmented, and Deep Learning-Optimized I-SPY 2 MRI Dataset for...

    • cancerimagingarchive.net
    n/a, nifti, tsv
    Updated Oct 15, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2015). Curated, Segmented, and Deep Learning-Optimized I-SPY 2 MRI Dataset for Prediction of pCR, HR, and HER2 Status [Dataset]. http://doi.org/10.7937/42wq-th78
    Explore at:
    nifti, tsv, n/aAvailable download formats
    Dataset updated
    Oct 15, 2015
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Abstract

    The BreastDCEDL_ISPY2 dataset is a curated, deep learning–ready resource that integrates pretreatment 3D Dynamic Contrast-Enhanced MRI (DCE-MRI) scans from 982 breast cancer patients enrolled in the I-SPY2 TRIAL, sourced from The Cancer Imaging Archive (TCIA). Imaging data has been standardized from raw DICOM to 3D NIfTI volumes, preserving signal integrity and spatial resolution. The dataset includes extensive non-imaging supporting data, such as tumor annotations, DICOM metadata, and demographics. To facilitate reproducible research, fixed benchmark train/validation/test splits are provided, stratified by biomarker subtypes and response outcomes. This dataset enables diverse research applications, including the development of deep learning models for predicting treatment response, radiomics-based analyses, and hormone receptor (HR) and HER2 status classification. It also facilitates benchmarking of advanced architectures such as Vision Transformers, and supports clinical translation efforts in the field of precision oncology

    Introduction

    Breast cancer remains one of the most prevalent causes of cancer-related mortality worldwide, and early detection coupled with accurate treatment response monitoring is essential for improving outcomes. Dynamic Contrast-Enhanced MRI (DCE-MRI) is a cornerstone modality for breast cancer imaging, offering unique insights into tumor vascularity, morphology, and treatment response. Despite its clinical importance, progress in computational and deep learning–based analysis of DCE-MRI has been hindered by the lack of large, standardized, and publicly available datasets. The BreastDCEDL_ISPY2 dataset was created to address this gap by consolidating and harmonizing imaging and clinical data from the I-SPY2 TRIAL. With 982 patients across more than 22 institutions, it represents one of the largest publicly accessible collections of pre-treatment DCE-MRI scans for breast cancer. Importantly, the dataset includes standardized 3D NIfTI volumes, tumor annotations, voxel-based tumor volumes, and harmonized clinicopathologic metadata such as hormone receptor status, HER2 status, and pathologic complete response outcomes. What makes BreastDCEDL_ISPY2 unique is its deep learning–ready structure and benchmark design. By providing consistent preprocessing, unified annotations, and predefined training/validation/test splits, the dataset enables reproducible research and direct comparison of computational methods. It lowers the technical barriers to working with heterogeneous MRI data, facilitates the development and validation of advanced machine learning models—including transformer-based architectures—and supports clinically relevant investigations into treatment response prediction and personalized therapy planning. The dataset includes extensive non-imaging supporting data:
    • Tumor annotations include both segmentation masks and region-of-interest (ROI) delineations.
    • Accompanying DICOM metadata encompasses voxel dimensions, signal enhancement ratio (SER) time points, and contrast agent injection timestamps.
    • Clinical metadata provides comprehensive patient information, including demographic variables (age, race, menopausal status), hormone receptor (HR) and HER2 receptor status, as well as treatment outcomes, specifically pathologic complete response (pCR).

    Methods

    Subject Inclusion and Exclusion Criteria

    The BreastDCEDL_ISPY2 dataset integrates patient data from the I-SPY2 TRIAL (2010–2016), yielding 985 patients with pretreatment DCE-MRI scans. Inclusion required at least three acquisitions (pre-contrast, early post-contrast, late post-contrast). Patients with incomplete imaging or missing essential metadata were excluded (3 cases), leaving 982 patients. The cohort reflects a clinically diverse population, with a mean age of ~50 years, racial composition (majority White, ~17% Black, others underrepresented), and tumor subtypes spanning HR+/HER2−, HER2+, and triple-negative cancers. pCR status is available for the majority of patients. Treatment histories reflect standardized neoadjuvant chemotherapy protocols. While the dataset includes multicenter acquisitions (22+ institutions), potential biases include predominance of U.S.-based populations, underrepresentation of some ethnic groups, and the trial setting, which may differ from community practice.

    Data Acquisition

    • MRI Acquisition: Pretreatment 3D DCE-MRI acquired on 1.5T and 3T scanners. Protocols varied across institutions but consistently included pre-contrast, early post-contrast, and late post-contrast acquisitions after gadolinium administration. Key technical parameters (TR, TE, slice thickness, voxel size, FOV) are preserved in metadata.
    • Clinical Data: Captured through electronic trial databases. Variables include demographics (age, race, menopausal status), receptor status (HR, HER2), tumor volume, and treatment outcome (pCR).
    • Other Data: Signal Enhancement Ratio (SER) maps and voxel-based tumor volumes are provided.
    • Missing Data: 3 patients were excluded due to incomplete imaging or metadata.

    Data Analysis

    • File Format Conversions: Raw DICOM images were converted into standardized 3D NIfTI volumes using a custom pipeline. Conversion preserved 16-bit dynamic range by storing as 64-bit floating-point data.
    • Manual Annotation and Segmentation Protocols: Tumor segmentations and ROI delineations provided by I-SPY2 radiologists; converted to binary 3D masks aligned to imaging volumes. Only the primary tumor was annotated if multiple tumors were present.
    • Quality Control and Validation: Tumor annotations were reviewed for alignment with MRI volumes. Consistency checks ensured tumor masks aligned across temporal phases. Patients with fewer than three valid acquisitions were excluded.
    • Scripts, Code, and Software Versions: Pipelines and Vision Transformer implementation are available on GitHub:https://github.com/naomifridman/BreastDCEDL" rel="nofollow"> https://github.com/naomifridman/BreastDCEDL

    Usage Notes

    Data Organization and Naming Conventions All imaging data are provided in standardized 3D NIfTI format, converted from original DICOM files while preserving full signal integrity. File names follow the structure:

    • Training set: 784 patients (32.1% pCR rate).
    • Validation set: 99 patients (32.3% pCR rate).
    • Test set: 99 patients (32.3% pCR rate).
    Partitioning was stratified by biomarker status (HR, HER2) and pCR outcomes to ensure balanced distributions across subsets. Users are encouraged to adopt these predefined splits when developing predictive models to enable fair comparisons across studies. Clinical Data Files Clinical and pathologic metadata are distributed in standardized TSV format. Variables include demographics (age, race, menopausal status), biomarker status (HR, HER2), tumor volume, and pCR outcomes. TSV files can be opened with standard spreadsheet software (e.g., Microsoft Excel, LibreOffice Calc) or programmatically accessed using Python (pandas) or R. Software Recommendations
    • NIfTI images: Compatible with common medical imaging platforms such as 3D Slicer, ITK-SNAP, and FSL. Python users may rely on nibabel for loading and handling imaging volumes.
    • Segmentation masks: Provided as binary 3D NIfTI volumes (1 = tumor, 0 = background), directly loadable in the same software.
    Potential Sources of Error or Variability
    • Inter-cohort heterogeneity: Imaging protocols (field strength, TR/TE, slice thickness) varied across centers, potentially introducing site effects.
    • Only the largest lesion was annotated for multifocal disease.
    • Population bias (predominantly U.S., underrepresentation of minorities).

    External Resources

    The source code for converting MRI data from DICOM to NIfTI format, along with usage examples, is available in the project’s GitHub repository:https://github.com/naomifridman/BreastDCEDL" rel="nofollow"> https://github.com/naomifridman/BreastDCEDL.

  13. Consumer Sentiment Data | Global Audience Insights | Psychographic Profiles...

    • datarade.ai
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). Consumer Sentiment Data | Global Audience Insights | Psychographic Profiles & Trends | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/consumer-sentiment-data-global-audience-insights-psychogr-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    Nigeria, Curaçao, Hungary, Uganda, South Africa, Barbados, Ecuador, Hong Kong, Italy, Macedonia (the former Yugoslav Republic of)
    Description

    Success.ai’s Consumer Sentiment Data offers businesses unparalleled insights into global audience attitudes, preferences, and emotional triggers. Sourced from continuous analysis of consumer behaviors, conversations, and feedback, this dataset includes psychographic profiles, interest data, and sentiment trends that help marketers, product teams, and strategists better understand their target customers. Whether you’re exploring a new market, refining your brand message, or enhancing product offerings, Success.ai ensures your consumer intelligence efforts are guided by timely, accurate, and context-rich data.

    Why Choose Success.ai’s Consumer Sentiment Data?

    1. Comprehensive Audience Insights

      • Access psychographic and interest-based profiles that reveal what motivates and influences your audience’s decisions.
      • Continuous updates ensure you stay aligned with shifting consumer sentiments, seasonal preferences, and emerging trends.
    2. Global Reach Across Industries and Demographics

      • Includes insights from various markets, age groups, cultural backgrounds, and income levels.
      • Identify consumer attitudes in different regions, helping you tailor campaigns, products, and messaging to diverse audiences.
    3. Continuously Updated Datasets

      • Real-time data analysis ensures that your consumer sentiment insights remain fresh, relevant, and actionable.
      • Adapt quickly to consumer feedback, market changes, and competitive pressures.
    4. Ethical and Compliant

      • Adheres to global data privacy regulations, ensuring your usage of consumer sentiment data is both legal and respectful of personal boundaries.

    Data Highlights:

    • Psychographic Profiles: Understand lifestyle preferences, values, and interests that shape consumer choices.
    • Sentiment Trends: Track evolving emotional responses to brands, products, and categories.
    • Global Audience Insights: Evaluate consumer sentiments across multiple regions, languages, and cultural contexts.
    • Continuous Updates: Receive current data that reflects the latest shifts in mood, opinion, and interest.

    Key Features of the Dataset:

    1. Granular Segmentation

      • Segment audiences by demographic, interest, buying behavior, and sentiment scores for targeted marketing efforts.
      • Focus on the attributes that matter most, from eco-conscious consumers to luxury shoppers or value seekers.
    2. Contextual Sentiment Analysis

      • Go beyond basic positive/negative sentiment to understand nuanced emotional responses.
      • Identify triggers that inspire loyalty, dissatisfaction, trust, or skepticism.
    3. AI-Driven Enrichment

      • Profiles enriched with actionable data provide deeper insights into consumer lifestyles, brand perceptions, and product affinities.
      • Leverage advanced analytics to develop personalized campaigns and product strategies.

    Strategic Use Cases:

    1. Marketing and Campaign Optimization

      • Craft campaigns that resonate emotionally by understanding what drives consumer engagement.
      • Adjust messaging, timing, and channels to align with evolving sentiment trends and seasonal shifts in consumer mood.
    2. Product Development and Innovation

      • Identify unmet consumer needs and preferences before launching new products.
      • Refine features, packaging, and pricing strategies based on real-time consumer responses.
    3. Brand Management and Positioning

      • Monitor brand perceptions to detect early signs of brand fatigue, trust erosion, or negative publicity.
      • Strengthen brand loyalty by addressing concerns, highlighting strengths, and adapting to changing market contexts.
    4. Competitive Analysis and Market Entry

      • Benchmark consumer sentiment towards competitors, industry leaders, and emerging disruptors.
      • Assess market readiness and optimize entry strategies for new regions or segments.

    Why Choose Success.ai?

    1. Best Price Guarantee

      • Access high-quality, verified data at competitive prices, ensuring efficient allocation of your marketing and research budgets.
    2. Seamless Integration

      • Integrate enriched sentiment data into your analytics, CRM, or marketing platforms via APIs or downloadable formats.
      • Simplify data management and accelerate decision-making processes.
    3. Data Accuracy with AI Validation

      • Benefit from AI-driven validation for reliable insights into consumer attitudes, leading to more confident data-driven strategies.
    4. Customizable and Scalable Solutions

      • Tailor datasets to focus on specific segments, regions, or interests, and scale as your business grows and evolves.

    APIs for Enhanced Functionality:

    1. Data Enrichment API

      • Enhance your existing consumer records with psychographic and sentiment insights, deepening your understanding of audience motivations.
    2. Lead Generation API

      • Identify audience segments receptive to your messaging, streamlini...
  14. d

    US Auto Data | Full VIN | 127,853,223 Vehicle Details | Make Model Year |...

    • datarade.ai
    .json, .csv, .xls
    Updated Aug 1, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompCurve (2010). US Auto Data | Full VIN | 127,853,223 Vehicle Details | Make Model Year | Ownership Signals | Consumer Demographics | Automotive Intelligence File [Dataset]. https://datarade.ai/data-products/us-auto-data-full-vin-127-853-223-vehicle-details-make-compcurve
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Aug 1, 2010
    Dataset authored and provided by
    CompCurve
    Area covered
    United States
    Description

    This dataset is a national, VIN-resolved automotive file containing detailed vehicle attributes, ownership signals, and linked consumer demographics. Every row is anchored by a full 17-character VIN, allowing precise matching, decoding, and enrichment across insurance, lending, automotive analytics, marketing, and identity-resolution workflows. The file covers 387M+ U.S. vehicles across all major OEMs, model types, and price tiers.

    The dataset includes vehicles from domestic manufacturers (e.g., Ford, GM, Stellantis) as well as foreign/import brands (e.g., Toyota, Honda, BMW, Mercedes, Hyundai, Kia). The manufacturerbased field clearly identifies where the OEM is headquartered, supporting segmentation such as domestic vs foreign, mainstream vs luxury, SUV vs sedan, gas vs hybrid vs electric, and new vs used ownership patterns.

    Vehicle & VIN Attribute Coverage

    Each record contains core vehicle details:

    vin – Full 17-character Vehicle Identification Number

    year – Model year

    make / model – OEM brand and specific model name

    manufacturer / manufacturerbased – Company name and domestic/foreign origin

    fuel – Fuel type (gas, diesel, hybrid, EV, flex-fuel)

    style – Marketing style (SUV, crossover, coupe, convertible, etc.)

    bodytype / bodysubtype – Body classification such as SUV, sedan, pickup, hatchback

    class – Market class (mainstream, luxury, premium, truck, etc.)

    size – Compact, mid-size, full-size, etc.

    doors – Number of doors

    vechicletype – Passenger car, light truck, SUV, etc.

    enginecylinders – Cylinder count

    transmissiontype / transmissiongears – Automatic, manual, CVT, and gear count

    gvwrange – Gross Vehicle Weight Rating (light duty vs heavy duty)

    weight / maxpayload – Weight/payload estimates

    trim – Detailed trim level

    msrp – Original MSRP for pricing tiers and value modeling

    validated / rankorder – Internal quality indicators

    These fields support risk modeling, valuation, depreciation curves, fleet analysis, replacement cycles, and comparisons across domestic and foreign OEMs.

    Ownership Signals & Lifecycle Indicators

    The dataset includes rich ownership timing and household-level automotive information:

    purchasedate – Date the vehicle was obtained, enabling:

    Tenure modeling

    Trade-in prediction

    Lease/loan lifecycle analysis

    Service interval modeling

    purchasenew – Purchased new vs used

    number_of_vehicles_in_hh – Total vehicles linked to the household

    validated – Confirmed record flag

    These attributes power auto replacement models, refinance targeting, multi-vehicle household insights, and OEM loyalty analytics.

    Consumer Identity & Address Standardization

    Each VIN record is linked to standardized consumer and household metadata:

    consumer_first / consumer_last / consumer_suffix – Owner name fields

    consumer_std_address – USPS-style standardized address

    consumer_std_city / consumer_std_state / consumer_std_zip – Clean geographic identifiers

    consumer_county_name – County for underwriting and geo-risk segmentation

    consumer_std_status – Address quality/verification status

    consumer_latitude / consumer_longitude – Geocoded coordinates for mapping, heatmaps, and risk scoring

    This enables identity resolution, entity matching, household-level modeling, and geographic segmentation.

    Consumer Demographics & Economic Indicators

    The auto file connects vehicles to extensive demographic and lifestyle fields, including:

    consumer_income_range – Household income band

    consumer_home_owner – Homeowner vs renter

    consumer_home_value – Home value range

    consumer_networth – Net worth category

    consumer_credit_range – Modeled credit tier

    consumer_gender / consumer_age / consumer_age_range – Demographic segment fields

    consumer_birth_year – Year-of-birth

    consumer_marital_status – Single/married

    consumer_presence_of_children / consumer_number_of_children – Household composition

    consumer_dwelling_type – Housing type

    consumer_length_of_residence / range – Stability indicator

    consumer_language, religion, ethnicity – Cultural/language segments

    consumer_pool_owner – Lifestyle attribute

    consumer_occupation / consumer_education_level – Socioeconomic indicators

    consumer_donor / consumer_veteran – Contribution and service attributes

    These fields enable hyper-granular segmentation, lifestyle-based modeling, wealth indexing, market analysis, and insurance/lending underwriting.

    Phone, Email & Contact Intel

    Each record may include up to three phones and three emails:

    consumer_phone1/2/3 – Contact numbers

    consumer_linetype1/2/3 – Wireless, landline, VOIP

    consumer_dnc1/2/3 – Do-Not-Call indicators

    consumer_email1/2/3 – Email addresses

    This supports compliant outreach, multi-channel activation, CRM enrichment, and identity graph expansion.

    Primary Use Cases Insurance & Risk Modeling

    VIN decoding, ownership tenure, household economics, and geo data support auto underwriting, pricing, rating territory analysis, and fraud screening.

    Auto Finance, Lending & Refinance

    Model trade-in window...

  15. c

    The Río Hortega University Hospital Glioblastoma dataset: a comprehensive...

    • cancerimagingarchive.net
    csv, dicom, n/a +1
    Updated Jun 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2023). The Río Hortega University Hospital Glioblastoma dataset: a comprehensive collection of preoperative, early postoperative and recurrence MRI scans [Dataset]. http://doi.org/10.7937/4545-c905
    Explore at:
    csv, n/a, dicom, niftiAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Jun 9, 2023
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This data collection consists of multiparametric MRI scans of 40 adult patients with histopathologically confirmed WHO grade 4 astrocytoma, who underwent surgery at the Río Hortega University Hospital in Valladolid, Spain, between January 2018 and December 2022. The dataset encompasses 600 MRI series, covering three time points: preoperative, early post-operative (less than 72 hours after surgery), and the follow-up scan, at which recurrence is diagnosed. Patients included in the sample underwent gross total resection (GTR) or near total resection (NTR), defined as having no residual tumor enhancement and an extent of resection of more than 95% of the initial enhancing volume, respectively. The modified Response Assessment in Neuro-Oncology criteria (RANO) were used to define tumor progression.

    The dataset contains T1-weighted (T1w), T2-weighted (T2w), Fluid Attenuated Inversion Recovery (FLAIR), T1w contrast-enhanced (T1ce) sequences, and diffusion-weighted imaging-derived apparent diffusion coefficient (ADC) maps. It also includes clinical and demographic data, IDH status, treatment information, and volumetric assessment of the extent of the resection. Moreover, the dataset comprises expert-validated segmentations of tumor subregions (e.g., enhancing tumor, necrosis, peritumoral region), generated through computer-aided methods from preoperative, postoperative, and follow-up scans.

    This dataset is unique in its inclusion of patients who underwent extensive resection of > 95% of the enhancing tumor. It also stands out from other publicly available datasets by providing early postoperative studies and segmentations, filling the gap in preoperative-focused datasets. By making these data publicly available, the scientific community can analyze recurrence patterns in patients who underwent total or near-total resection and develop new registration and segmentation algorithms focused on post-surgical and follow-up studies.

  16. Customer Segmentation Data

    • kaggle.com
    zip
    Updated Mar 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smit Raval (2024). Customer Segmentation Data [Dataset]. https://www.kaggle.com/datasets/ravalsmit/customer-segmentation-data/discussion
    Explore at:
    zip(1842344 bytes)Available download formats
    Dataset updated
    Mar 11, 2024
    Authors
    Smit Raval
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.

    Key Features:

    Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.

    Usage Examples:

    Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!

  17. Demographic and clinical characteristics of the younger cohort....

    • figshare.com
    • plos.figshare.com
    xls
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahad Salman; Niels Bergsland; Michael G. Dwyer; Jack A. Reeves; Abhisri Ramesh; Dejan Jakimovski; Bianca Weinstock-Guttman; Robert Zivadinov; Ferdinand Schweser (2025). Demographic and clinical characteristics of the younger cohort. M:F = Male:Female; CIS = Clinically Isolated Syndrome; RMS = Relapsing-Remitting Multiple Sclerosis; EDSS = Expanded Disability Status Scale. [Dataset]. http://doi.org/10.1371/journal.pone.0332478.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Fahad Salman; Niels Bergsland; Michael G. Dwyer; Jack A. Reeves; Abhisri Ramesh; Dejan Jakimovski; Bianca Weinstock-Guttman; Robert Zivadinov; Ferdinand Schweser
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic and clinical characteristics of the younger cohort. M:F = Male:Female; CIS = Clinically Isolated Syndrome; RMS = Relapsing-Remitting Multiple Sclerosis; EDSS = Expanded Disability Status Scale.

  18. Segmentation and socio-demographic variables.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Carvache-Franco; Tahani Hassan; Orly Carvache-Franco; Wilmer Carvache-Franco; Olga Martin-Moreno (2023). Segmentation and socio-demographic variables. [Dataset]. http://doi.org/10.1371/journal.pone.0287113.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Mauricio Carvache-Franco; Tahani Hassan; Orly Carvache-Franco; Wilmer Carvache-Franco; Olga Martin-Moreno
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Food festivals have been a growing tourism sector in recent years due to their contributions to a region’s economic, marketing, brand, and social growth. This study analyses the demand for the Bahrain food festival. The stated objectives were: i) To identify the motivational dimensions of the demand for the food festival, (ii) To determine the segments of the demand for the food festival, and (iii) To establish the relationship between the demand segments and socio-demographic aspects. The food festival investigated was the Bahrain Food Festival held in Bahrain, located on the east coast of the Persian Gulf. The sample consisted of 380 valid questionnaires and was taken using social networks from those attending the event. The statistical techniques used were factorial analysis and the K-means grouping method. The results show five motivational dimensions: Local food, Art, Entertainment, Socialization, and Escape and novelty. In addition, two segments were found; the first, Entertainment and novelties, is related to attendees who seek to enjoy the festive atmosphere and discover new restaurants. The second is Multiple motives, formed by attendees with several motivations simultaneously. This segment has the highest income and expenses, making it the most important group for developing plans and strategies. The results will contribute to the academic literature and the organizers of food festivals.

  19. Assessing the validity of a data driven segmentation approach: A 4 year...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo (2023). Assessing the validity of a data driven segmentation approach: A 4 year longitudinal study of healthcare utilization and mortality [Dataset]. http://doi.org/10.1371/journal.pone.0195243
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.

  20. Café Rewards

    • kaggle.com
    zip
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2025). Café Rewards [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/caf-rewards/code
    Explore at:
    zip(447957 bytes)Available download formats
    Dataset updated
    Feb 4, 2025
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset comprises a meticulously structured collection of customer-related information designed for efficient machine learning applications. It consists of three primary folders—offers, customers, and events—each containing valuable data that enable detailed analysis of customer behavior, response to promotional offers, and overall engagement over a 30-day period.

    Offers The offers folder contains comprehensive details on various promotional offers that were sent to customers within the 30-day timeframe. Each offer is uniquely identified by an offer_id, which serves as the primary key. Offers are categorized into three distinct types:

    BOGO (Buy One, Get One): A customer must purchase a specific product to receive another for free. Discount: A direct discount applied to purchases, incentivizing spending. Informational: Provides details about a promotion without requiring any spending or offering a direct reward. Each offer has specific requirements and rewards:

    difficulty: The minimum amount a customer must spend to qualify for the offer. reward: The monetary reward (in USD) received upon successful completion of the offer. duration: The number of days a customer has to complete the offer after receiving it. channels: The marketing channels used to send the offer, which may include email, mobile app notifications, social media, or direct mail. By analyzing the offers dataset, businesses can assess the effectiveness of different promotional strategies and optimize future campaigns.

    Customers The customers folder contains demographic information for each member in the dataset. Each customer is uniquely identified using customer_id, which acts as the primary key. The dataset includes the following attributes:

    became_member_on: The date (formatted as YYYYMMDD) when the customer created their account. This information helps track customer loyalty and tenure. gender: The customer's gender, categorized as (M)ale, (F)emale, or (O)ther. This allows for demographic segmentation and targeted marketing analysis. age: The customer’s age, useful for analyzing purchasing patterns and offer preferences across different age groups. income: The estimated annual income of the customer (in USD), enabling insights into spending behavior based on economic status. With this dataset, machine learning models can predict customer preferences, segment users into meaningful groups, and tailor offers based on demographic factors.

    Events The events folder logs customer activity throughout the 30-day period, capturing interactions with offers and transactions. Each record is associated with a specific customer_id, serving as a foreign key to link activities to individual users. The dataset includes:

    event: A categorical description of the customer's interaction. The possible events include:

    Transaction: A recorded purchase made by the customer. Offer Received: A notification that an offer was sent to the customer. Offer Viewed: The customer actively opened and engaged with the offer. Offer Completed: The customer fulfilled the necessary conditions to claim the offer's reward. value: A dictionary of values linked to the event, which varies depending on the type of activity:

    For transactions, value represents the amount spent by the customer. For offers received, viewed, or completed, value contains the corresponding offer_id. time: A numerical indicator representing the number of hours passed in the 30-day observation window (starting from 0). This allows for tracking customer engagement over time and understanding behavioral trends.

    By analyzing the events dataset, businesses can gain insights into customer interactions, measure the success of promotional offers, and identify patterns in spending behavior. Machine learning models can leverage this data to predict which offers will be most effective for different customer segments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prashob Narendran (2025). Customer_Financial_Data [Dataset]. https://www.kaggle.com/datasets/prashobnarendran/customer-financial-data
Organization logo

Customer_Financial_Data

A comprehensive dataset of bank customer demographics and financial behavior

Explore at:
zip(62099 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
Prashob Narendran
Description

Context This dataset contains detailed, anonymized information about a bank's customers. It includes demographic data such as age, income, and family size, as well as financial information like mortgage value, credit card ownership, and average spending habits. The data is well-suited for a variety of machine learning tasks, particularly in the domain of financial services and marketing.

Content The dataset consists of 5000 customer records with 14 attributes:

  • Customer_ID: A unique identifier for each customer.
  • Age: The customer's age in completed years.
  • Years_Experience: Years of professional experience.
  • Annual_Income: Annual income of the customer (in thousands of dollars).
  • ZIP_Code: The customer's home address ZIP code.
  • Family_size: The number of individuals in the customer's family.
  • Avg_Spending: Average monthly spending on credit cards (in thousands of dollars).
  • Education_Level: A categorical variable for education level (1: Undergraduate, 2: Graduate, 3: Advanced/Professional).
  • Mortgage: The value of the customer's house mortgage if any (in thousands of dollars).
  • Has_Consumer_Loan: Binary variable indicating if the customer accepted a personal loan in the last campaign (1: Yes, 0: No). This is a potential target variable.
  • Has_Securities_Account: Binary variable indicating if the customer has a securities account with the bank.
  • Has_CD_Account: Binary variable indicating if the customer has a certificate of deposit (CD) account with the bank.
  • Uses_Online_Banking: Binary variable indicating if the customer uses online banking services.
  • Has_CreditCard: Binary variable indicating if the customer uses a credit card issued by this bank.

Data Quality Note Some rows contain negative values for the Years_Experience column. This is a data quality issue that may require preprocessing (e.g., imputation by taking the absolute value or using the average of similar age groups).

Potential Use Cases This dataset is excellent for both educational and practical purposes. You can use it to:

  1. Predict Loan Acceptance: Build a classification model to predict which customers are most likely to accept a personal loan (Has_Consumer_Loan).
  2. Customer Segmentation: Use clustering algorithms (like K-Means) to identify distinct customer segments for targeted marketing campaigns.
  3. Credit Card Adoption: Analyze the factors that influence a customer's decision to get a bank-issued credit card.
  4. Exploratory Data Analysis (EDA): Practice your data analysis and visualization skills to uncover insights about customer behavior.
Search
Clear search
Close search
Google apps
Main menu