33 datasets found

Customer_Financial_Data
kaggle.com
zip
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashob Narendran (2025). Customer_Financial_Data [Dataset]. https://www.kaggle.com/datasets/prashobnarendran/customer-financial-data
Explore at:
zip(62099 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
Prashob Narendran
Description
Context This dataset contains detailed, anonymized information about a bank's customers. It includes demographic data such as age, income, and family size, as well as financial information like mortgage value, credit card ownership, and average spending habits. The data is well-suited for a variety of machine learning tasks, particularly in the domain of financial services and marketing.

Content The dataset consists of 5000 customer records with 14 attributes:

Customer_ID: A unique identifier for each customer.

Age: The customer's age in completed years.

Years_Experience: Years of professional experience.

Annual_Income: Annual income of the customer (in thousands of dollars).

ZIP_Code: The customer's home address ZIP code.

Family_size: The number of individuals in the customer's family.

Avg_Spending: Average monthly spending on credit cards (in thousands of dollars).

Education_Level: A categorical variable for education level (1: Undergraduate, 2: Graduate, 3: Advanced/Professional).

Mortgage: The value of the customer's house mortgage if any (in thousands of dollars).

Has_Consumer_Loan: Binary variable indicating if the customer accepted a personal loan in the last campaign (1: Yes, 0: No). This is a potential target variable.

Has_Securities_Account: Binary variable indicating if the customer has a securities account with the bank.

Has_CD_Account: Binary variable indicating if the customer has a certificate of deposit (CD) account with the bank.

Uses_Online_Banking: Binary variable indicating if the customer uses online banking services.

Has_CreditCard: Binary variable indicating if the customer uses a credit card issued by this bank.

Data Quality Note Some rows contain negative values for the Years_Experience column. This is a data quality issue that may require preprocessing (e.g., imputation by taking the absolute value or using the average of similar age groups).

Potential Use Cases This dataset is excellent for both educational and practical purposes. You can use it to:

Predict Loan Acceptance: Build a classification model to predict which customers are most likely to accept a personal loan (Has_Consumer_Loan).

Customer Segmentation: Use clustering algorithms (like K-Means) to identify distinct customer segments for targeted marketing campaigns.

Credit Card Adoption: Analyze the factors that influence a customer's decision to get a bank-issued credit card.

Exploratory Data Analysis (EDA): Practice your data analysis and visualization skills to uncover insights about customer behavior.
Predicting Credit Card Customer Segmentation
kaggle.com
zip
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2024). Predicting Credit Card Customer Segmentation [Dataset]. https://www.kaggle.com/datasets/thedevastator/predicting-credit-card-customer-attrition-with-m/code
Explore at:
zip(387771 bytes)Available download formats
Dataset updated
Mar 10, 2024
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Predicting Credit Card Customer Segmentation

Exploring Key Customer Characteristics

By [source]

About this dataset

This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.

Research Ideas

Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.

Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.

Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...
d
GIS Data | USA & Canada | Over 40k Demographics Variables To Inform Business...
datarade.ai
.json, .csv
Updated Aug 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GapMaps (2024). GIS Data | USA & Canada | Over 40k Demographics Variables To Inform Business Decisions | Consumer Spending Data| Demographic Data [Dataset]. https://datarade.ai/data-products/gapmaps-premium-demographic-data-by-ags-usa-canada-gis-gapmaps
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 13, 2024
Dataset authored and provided by
GapMaps
Area covered
Canada, United States
Description
GapMaps GIS data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.

GIS Data attributes include:

Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.

Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.

Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.

Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.

Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.

Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.

Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.

Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain

Primary Use Cases for GapMaps GIS Data:

Retail (eg. Fast Food/ QSR, Cafe, Fitness, Supermarket/Grocery)

Customer Profiling: get a detailed understanding of the demographic & segmentation profile of your customers, where they work and their spending potential

Analyse your trade areas at a granular census block level using all the key metrics

Site Selection: Identify optimal locations for future expansion and benchmark performance across existing locations.

Target Marketing: Develop effective marketing strategies to acquire more customers.

Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.

Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)

Network Planning

Customer (Risk) Profiling for insurance/loan approvals

Target Marketing

Competitive Analysis

Market Optimization

Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)

Tenant Recruitment

Target Marketing

Market Potential / Gap Analysis

Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)

Customer Profiling

Target Marketing

Market Share Analysis
Z
Data from: Customer Segmentation in the Digital Marketing Using a Q-Learning...
data-staging.niaid.nih.gov
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Guanqun (2025). Customer Segmentation in the Digital Marketing Using a Q-Learning Based Differential Evolution Algorithm Integrated with K-means clustering [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14614252
Explore at:
Dataset updated
Jan 8, 2025
Authors
Wang, Guanqun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset was collected from Kaggle. It includes various features related to customer demographics, purchasing behavior, and other relevant metrics.
Customer Segmentation Data for Marketing Analysis
kaggle.com
zip
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahmida (2024). Customer Segmentation Data for Marketing Analysis [Dataset]. https://www.kaggle.com/datasets/fahmidachowdhury/customer-segmentation-data-for-marketing-analysis/code
Explore at:
zip(16744 bytes)Available download formats
Dataset updated
Jun 28, 2024
Authors
Fahmida
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains simulated customer data that can be used for segmentation analysis. It includes demographic and behavioral information about customers, which can help in identifying distinct segments within the customer base. This can be particularly useful for targeted marketing strategies, improving customer satisfaction, and increasing sales.

Columns: id: Unique identifier for each customer. age: Age of the customer. gender: Gender of the customer (Male, Female, Other). income: Annual income of the customer (in USD). spending_score: Spending score (1-100), indicating the customer's spending behavior and loyalty. membership_years: Number of years the customer has been a member. purchase_frequency: Number of purchases made by the customer in the last year. preferred_category: Preferred shopping category (Electronics, Clothing, Groceries, Home & Garden, Sports). last_purchase_amount: Amount spent by the customer on their last purchase (in USD). Potential Uses: Customer Segmentation: Identify different customer segments based on their demographic and behavioral characteristics. Targeted Marketing: Develop targeted marketing strategies for different customer segments. Customer Loyalty Programs: Design loyalty programs based on customer spending behavior and preferences. Sales Analysis: Analyze sales patterns and predict future trends.
d
Demographic Data | USA & Canada | Latest Estimates & Projections To Inform...
datarade.ai
.json, .csv
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GapMaps (2024). Demographic Data | USA & Canada | Latest Estimates & Projections To Inform Business Decisions | GIS Data | Map Data [Dataset]. https://datarade.ai/data-products/gapmaps-ags-usa-demographics-data-40k-variables-trusted-gapmaps
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jun 24, 2024
Dataset authored and provided by
GapMaps
Area covered
Canada, United States
Description
GapMaps premium demographic data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.

Demographic Data attributes include:

Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.

Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.

Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.

Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.

Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.

Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.

Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.

Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain

Primary Use Cases for AGS Demographic Data:

Retail (eg. Fast Food/ QSR, Cafe, Fitness, Supermarket/Grocery)

Customer Profiling: get a detailed understanding of the demographic & segmentation profile of your customers, where they work and their spending potential

Analyse your trade areas at a granular census block level using all the key metrics

Site Selection: Identify optimal locations for future expansion and benchmark performance across existing locations.

Target Marketing: Develop effective marketing strategies to acquire more customers.

Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.

Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)

Network Planning

Customer (Risk) Profiling for insurance/loan approvals

Target Marketing

Competitive Analysis

Market Optimization

Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)

Tenant Recruitment

Target Marketing

Market Potential / Gap Analysis

Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)

Customer Profiling

Target Marketing

Market Share Analysis
d
US Consumer Demographics | Homeowners & Renters | Email & Mobile Phone |...
datarade.ai
.json, .csv, .xls
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompCurve (2024). US Consumer Demographics | Homeowners & Renters | Email & Mobile Phone | Bulk & Custom | 255M People [Dataset]. https://datarade.ai/data-products/compcurve-us-consumer-demographics-homeowners-renters-compcurve
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Oct 18, 2024
Dataset authored and provided by
CompCurve
Area covered
United States
Description
Knowing who your consumers are is essential for businesses, marketers, and researchers. This detailed demographic file offers an in-depth look at American consumers, packed with insights about personal details, household information, financial status, and lifestyle choices. Let's take a closer look at the data:

Personal Identifiers and Basic Demographics At the heart of this dataset are the key details that make up a consumer profile:

Unique IDs (PID, HHID) for individuals and households Full names (First, Middle, Last) and suffixes Gender and age Date of birth Complete location details (address, city, state, ZIP) These identifiers are critical for accurate marketing and form the base for deeper analysis.

Geospatial Intelligence This file goes beyond just listing addresses by including rich geospatial data like:

Latitude and longitude Census tract and block details Codes for Metropolitan Statistical Areas (MSA) and Core-Based Statistical Areas (CBSA) County size codes Geocoding accuracy This allows for precise geographic segmentation and localized marketing.

Housing and Property Data The dataset covers a lot of ground when it comes to housing, providing valuable insights for real estate professionals, lenders, and home service providers:

Homeownership status Dwelling type (single-family, multi-family, etc.) Property values (market, assessed, and appraised) Year built and square footage Room count, amenities like fireplaces or pools, and building quality This data is crucial for targeting homeowners with products and services like refinancing or home improvement offers.

Wealth and Financial Data For a deeper dive into consumer wealth, the file includes:

Estimated household income Wealth scores Credit card usage Mortgage info (loan amounts, rates, terms) Home equity estimates and investment property ownership These indicators are invaluable for financial services, luxury brands, and fundraising organizations looking to reach affluent individuals.

Lifestyle and Interests One of the most useful features of the dataset is its extensive lifestyle segmentation:

Hobbies and interests (e.g., gardening, travel, sports) Book preferences, magazine subscriptions Outdoor activities (camping, fishing, hunting) Pet ownership, tech usage, political views, and religious affiliations This data is perfect for crafting personalized marketing campaigns and developing products that align with specific consumer preferences.

Consumer Behavior and Purchase Habits The file also sheds light on how consumers behave and shop:

Online and catalog shopping preferences Gift-giving tendencies, presence of children, vehicle ownership Media consumption (TV, radio, internet) Retailers and e-commerce businesses will find this behavioral data especially useful for tailoring their outreach.

Demographic Clusters and Segmentation Pre-built segments like:

Household, neighborhood, family, and digital clusters Generational and lifestage groups make it easier to quickly target specific demographics, streamlining the process for market analysis and campaign planning.

Ethnicity and Language Preferences In today's multicultural market, knowing your audience's cultural background is key. The file includes:

Ethnicity codes and language preferences Flags for Hispanic/Spanish-speaking households This helps ensure culturally relevant and sensitive communication.

Education and Occupation Data The dataset also tracks education and career info:

Education level and occupation codes Home-based business indicators This data is essential for B2B marketers, recruitment agencies, and education-focused campaigns.

Digital and Social Media Habits With everyone online, digital behavior insights are a must:

Internet, TV, radio, and magazine usage Social media platform engagement (Facebook, Instagram, LinkedIn) Streaming subscriptions (Netflix, Hulu) This data helps marketers, app developers, and social media managers connect with their audience in the digital space.

Political and Charitable Tendencies For political campaigns or non-profits, this dataset offers:

Political affiliations and outlook Charitable donation history Volunteer activities These insights are perfect for cause-related marketing and targeted political outreach.

Neighborhood Characteristics By incorporating census data, the file provides a bigger picture of the consumer's environment:

Population density, racial composition, and age distribution Housing occupancy and ownership rates This offers important context for understanding the demographic landscape.

Predictive Consumer Indexes The dataset includes forward-looking indicators in categories like:

Fashion, automotive, and beauty products Health, home decor, pet products, sports, and travel These predictive insights help businesses anticipate consumer trends and needs.

Contact Information Finally, the file includes key communication details:

Multiple phone numbers (landline, mobile) and email addresses Do Not Call (DNC) flags...
Customer Segmentation for Targeted Campaigns
kaggle.com
zip
Updated May 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mani Devesh (2024). Customer Segmentation for Targeted Campaigns [Dataset]. https://www.kaggle.com/datasets/manidevesh/customer-sales-data
Explore at:
zip(914292 bytes)Available download formats
Dataset updated
May 21, 2024
Authors
Mani Devesh
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Project Overview: Customer Segmentation Using K-Means Clustering

Introduction In this project, I analysed customer data from a retail store to identify distinct customer segments. The dataset includes key attributes such as age, city, and total sales of the customers. By leveraging K-Means clustering, an unsupervised machine learning technique, I aim to group customers based on their age and sales metrics. These insights will enable the creation of targeted marketing campaigns tailored to the specific needs and behaviours of each customer segment.

Objectives - Cluster Customers: Use K-Means clustering to group customers based on age and total sales. - Analyse Segments: Examine the characteristics of each customer segment. - Targeted Marketing: Develop strategies for personalized marketing campaigns targeting each identified customer group.

Data Description The dataset comprises:

Age: The age of the customers.

City: The city where the customers reside.

Total Sales: The total sales generated by each customer.

Methodology - Data Preprocessing: Clean and preprocess the data to handle any missing or inconsistent entries. - Feature Selection: Focus on age and total sales as primary features for clustering. - K-Means Clustering: Apply the K-Means algorithm to identify distinct customer segments. - Cluster Analysis: Analyse the resulting clusters to understand the demographic and sales characteristics of each group. - Marketing Strategy Development: Create targeted marketing strategies for each customer segment to enhance engagement and sales.

Expected Outcomes - Customer Segments: Clear identification of customer groups based on age and purchasing behaviour. - Insights for Marketing: Detailed understanding of each segment to inform targeted marketing efforts. - Business Impact: Enhanced ability to tailor marketing campaigns, potentially leading to increased customer satisfaction and sales.

By clustering customers based on age and total sales, this project aims to provide actionable insights for personalized marketing, ultimately driving better customer engagement and higher sales for the retail store.
m
Dataset on Brand Ethics, Trust, Customer Experience, and Loyalty in Latin...
data.mendeley.com
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nathalie Peña García (2025). Dataset on Brand Ethics, Trust, Customer Experience, and Loyalty in Latin American Consumers: The Alpina® Case [Dataset]. http://doi.org/10.17632/5hfw5ntfck.1
Explore at:
Unique identifier
https://doi.org/10.17632/5hfw5ntfck.1
Dataset updated
Oct 10, 2025
Authors
Nathalie Peña García
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Latin America
Description
This dataset contains the full responses from a structured survey conducted in Colombia (2024) aimed at analyzing the relationships between perceived brand ethics, trust, service quality, customer experience, perceived value, brand engagement, and loyalty. The study includes socio-demographic segmentation by educational level and focuses on the consumer perception of Alpina®, a leading brand in the Latin American food industry.
c
Consumer Behavior and Shopping Habits Dataset:
cubig.ai
zip
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Consumer Behavior and Shopping Habits Dataset: [Dataset]. https://cubig.ai/store/products/352/consumer-behavior-and-shopping-habits-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 28, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Consumer Behavior and Shopping Habits Dataset is a tabular collection of customer demographics, purchase history, product preferences, shopping frequency, and online and offline purchasing behavior.

2) Data Utilization (1) Consumer Behavior and Shopping Habits Dataset has characteristics that: • Each row contains detailed consumer and transaction information such as customer ID, age, gender, purchased goods and categories, purchase amount, region, product attributes (size, color, season), review rating, subscription status, delivery method, discount/promotion usage, payment method, purchase frequency, etc. • Data is organized to cover a variety of variables and purchasing patterns to help segment customers, establish marketing strategies, analyze product preferences, and more. (2) Consumer Behavior and Shopping Habits Dataset can be used to: • Customer Segmentation and Target Marketing: You can analyze demographics and purchasing patterns to define different customer groups and use them to develop customized marketing strategies. • Product and service improvement: Based on purchase history, review ratings, discount/promotional responses, etc., it can be applied to product and service improvements such as identifying popular products, managing inventory, and analyzing promotion effects.
facebook fact checking dataset
figshare.com
csv
Updated Nov 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mehdi khalil (2024). facebook fact checking dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27645690.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27645690.v2
Dataset updated
Nov 11, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
mehdi khalil
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.
c
Curated, Segmented, and Deep Learning-Optimized I-SPY 2 MRI Dataset for...
cancerimagingarchive.net
n/a, nifti, tsv
Updated Oct 15, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2015). Curated, Segmented, and Deep Learning-Optimized I-SPY 2 MRI Dataset for Prediction of pCR, HR, and HER2 Status [Dataset]. http://doi.org/10.7937/42wq-th78
Explore at:
nifti, tsv, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/42wq-th78
Dataset updated
Oct 15, 2015
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
The BreastDCEDL_ISPY2 dataset is a curated, deep learning–ready resource that integrates pretreatment 3D Dynamic Contrast-Enhanced MRI (DCE-MRI) scans from 982 breast cancer patients enrolled in the I-SPY2 TRIAL, sourced from The Cancer Imaging Archive (TCIA). Imaging data has been standardized from raw DICOM to 3D NIfTI volumes, preserving signal integrity and spatial resolution. The dataset includes extensive non-imaging supporting data, such as tumor annotations, DICOM metadata, and demographics. To facilitate reproducible research, fixed benchmark train/validation/test splits are provided, stratified by biomarker subtypes and response outcomes. This dataset enables diverse research applications, including the development of deep learning models for predicting treatment response, radiomics-based analyses, and hormone receptor (HR) and HER2 status classification. It also facilitates benchmarking of advanced architectures such as Vision Transformers, and supports clinical translation efforts in the field of precision oncology
Introduction
Breast cancer remains one of the most prevalent causes of cancer-related mortality worldwide, and early detection coupled with accurate treatment response monitoring is essential for improving outcomes. Dynamic Contrast-Enhanced MRI (DCE-MRI) is a cornerstone modality for breast cancer imaging, offering unique insights into tumor vascularity, morphology, and treatment response. Despite its clinical importance, progress in computational and deep learning–based analysis of DCE-MRI has been hindered by the lack of large, standardized, and publicly available datasets. The BreastDCEDL_ISPY2 dataset was created to address this gap by consolidating and harmonizing imaging and clinical data from the I-SPY2 TRIAL. With 982 patients across more than 22 institutions, it represents one of the largest publicly accessible collections of pre-treatment DCE-MRI scans for breast cancer. Importantly, the dataset includes standardized 3D NIfTI volumes, tumor annotations, voxel-based tumor volumes, and harmonized clinicopathologic metadata such as hormone receptor status, HER2 status, and pathologic complete response outcomes. What makes BreastDCEDL_ISPY2 unique is its deep learning–ready structure and benchmark design. By providing consistent preprocessing, unified annotations, and predefined training/validation/test splits, the dataset enables reproducible research and direct comparison of computational methods. It lowers the technical barriers to working with heterogeneous MRI data, facilitates the development and validation of advanced machine learning models—including transformer-based architectures—and supports clinically relevant investigations into treatment response prediction and personalized therapy planning. The dataset includes extensive non-imaging supporting data:

Tumor annotations include both segmentation masks and region-of-interest (ROI) delineations.

Accompanying DICOM metadata encompasses voxel dimensions, signal enhancement ratio (SER) time points, and contrast agent injection timestamps.

Clinical metadata provides comprehensive patient information, including demographic variables (age, race, menopausal status), hormone receptor (HR) and HER2 receptor status, as well as treatment outcomes, specifically pathologic complete response (pCR).

Methods

Subject Inclusion and Exclusion Criteria
The BreastDCEDL_ISPY2 dataset integrates patient data from the I-SPY2 TRIAL (2010–2016), yielding 985 patients with pretreatment DCE-MRI scans. Inclusion required at least three acquisitions (pre-contrast, early post-contrast, late post-contrast). Patients with incomplete imaging or missing essential metadata were excluded (3 cases), leaving 982 patients. The cohort reflects a clinically diverse population, with a mean age of ~50 years, racial composition (majority White, ~17% Black, others underrepresented), and tumor subtypes spanning HR+/HER2−, HER2+, and triple-negative cancers. pCR status is available for the majority of patients. Treatment histories reflect standardized neoadjuvant chemotherapy protocols. While the dataset includes multicenter acquisitions (22+ institutions), potential biases include predominance of U.S.-based populations, underrepresentation of some ethnic groups, and the trial setting, which may differ from community practice.
Data Acquisition

MRI Acquisition: Pretreatment 3D DCE-MRI acquired on 1.5T and 3T scanners. Protocols varied across institutions but consistently included pre-contrast, early post-contrast, and late post-contrast acquisitions after gadolinium administration. Key technical parameters (TR, TE, slice thickness, voxel size, FOV) are preserved in metadata.

Clinical Data: Captured through electronic trial databases. Variables include demographics (age, race, menopausal status), receptor status (HR, HER2), tumor volume, and treatment outcome (pCR).

Other Data: Signal Enhancement Ratio (SER) maps and voxel-based tumor volumes are provided.

Missing Data: 3 patients were excluded due to incomplete imaging or metadata.

Data Analysis

File Format Conversions: Raw DICOM images were converted into standardized 3D NIfTI volumes using a custom pipeline. Conversion preserved 16-bit dynamic range by storing as 64-bit floating-point data.

Manual Annotation and Segmentation Protocols: Tumor segmentations and ROI delineations provided by I-SPY2 radiologists; converted to binary 3D masks aligned to imaging volumes. Only the primary tumor was annotated if multiple tumors were present.

Quality Control and Validation: Tumor annotations were reviewed for alignment with MRI volumes. Consistency checks ensured tumor masks aligned across temporal phases. Patients with fewer than three valid acquisitions were excluded.

Scripts, Code, and Software Versions: Pipelines and Vision Transformer implementation are available on GitHub:https://github.com/naomifridman/BreastDCEDL" rel="nofollow"> https://github.com/naomifridman/BreastDCEDL

Usage Notes

Data Organization and Naming Conventions All imaging data are provided in standardized 3D NIfTI format, converted from original DICOM files while preserving full signal integrity. File names follow the structure:

Training set: 784 patients (32.1% pCR rate).

Validation set: 99 patients (32.3% pCR rate).

Test set: 99 patients (32.3% pCR rate).

Partitioning was stratified by biomarker status (HR, HER2) and pCR outcomes to ensure balanced distributions across subsets. Users are encouraged to adopt these predefined splits when developing predictive models to enable fair comparisons across studies. Clinical Data Files Clinical and pathologic metadata are distributed in standardized TSV format. Variables include demographics (age, race, menopausal status), biomarker status (HR, HER2), tumor volume, and pCR outcomes. TSV files can be opened with standard spreadsheet software (e.g., Microsoft Excel, LibreOffice Calc) or programmatically accessed using Python (pandas) or R. Software Recommendations

NIfTI images: Compatible with common medical imaging platforms such as 3D Slicer, ITK-SNAP, and FSL. Python users may rely on nibabel for loading and handling imaging volumes.

Segmentation masks: Provided as binary 3D NIfTI volumes (1 = tumor, 0 = background), directly loadable in the same software.

Potential Sources of Error or Variability

Inter-cohort heterogeneity: Imaging protocols (field strength, TR/TE, slice thickness) varied across centers, potentially introducing site effects.

Only the largest lesion was annotated for multifocal disease.

Population bias (predominantly U.S., underrepresentation of minorities).

External Resources
The source code for converting MRI data from DICOM to NIfTI format, along with usage examples, is available in the project’s GitHub repository:https://github.com/naomifridman/BreastDCEDL" rel="nofollow"> https://github.com/naomifridman/BreastDCEDL.
Consumer Sentiment Data | Global Audience Insights | Psychographic Profiles...
datarade.ai
Updated Oct 27, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai (2021). Consumer Sentiment Data | Global Audience Insights | Psychographic Profiles & Trends | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/consumer-sentiment-data-global-audience-insights-psychogr-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Oct 27, 2021
Dataset provided by
Area covered
Nigeria, Curaçao, Hungary, Uganda, South Africa, Barbados, Ecuador, Hong Kong, Italy, Macedonia (the former Yugoslav Republic of)
Description
Success.ai’s Consumer Sentiment Data offers businesses unparalleled insights into global audience attitudes, preferences, and emotional triggers. Sourced from continuous analysis of consumer behaviors, conversations, and feedback, this dataset includes psychographic profiles, interest data, and sentiment trends that help marketers, product teams, and strategists better understand their target customers. Whether you’re exploring a new market, refining your brand message, or enhancing product offerings, Success.ai ensures your consumer intelligence efforts are guided by timely, accurate, and context-rich data.

Why Choose Success.ai’s Consumer Sentiment Data?

Comprehensive Audience Insights

Access psychographic and interest-based profiles that reveal what motivates and influences your audience’s decisions.

Continuous updates ensure you stay aligned with shifting consumer sentiments, seasonal preferences, and emerging trends.

Global Reach Across Industries and Demographics

Includes insights from various markets, age groups, cultural backgrounds, and income levels.

Identify consumer attitudes in different regions, helping you tailor campaigns, products, and messaging to diverse audiences.

Continuously Updated Datasets

Real-time data analysis ensures that your consumer sentiment insights remain fresh, relevant, and actionable.

Adapt quickly to consumer feedback, market changes, and competitive pressures.

Ethical and Compliant

Adheres to global data privacy regulations, ensuring your usage of consumer sentiment data is both legal and respectful of personal boundaries.

Data Highlights:

Psychographic Profiles: Understand lifestyle preferences, values, and interests that shape consumer choices.

Sentiment Trends: Track evolving emotional responses to brands, products, and categories.

Global Audience Insights: Evaluate consumer sentiments across multiple regions, languages, and cultural contexts.

Continuous Updates: Receive current data that reflects the latest shifts in mood, opinion, and interest.

Key Features of the Dataset:

Granular Segmentation

Segment audiences by demographic, interest, buying behavior, and sentiment scores for targeted marketing efforts.

Focus on the attributes that matter most, from eco-conscious consumers to luxury shoppers or value seekers.

Contextual Sentiment Analysis

Go beyond basic positive/negative sentiment to understand nuanced emotional responses.

Identify triggers that inspire loyalty, dissatisfaction, trust, or skepticism.

AI-Driven Enrichment

Profiles enriched with actionable data provide deeper insights into consumer lifestyles, brand perceptions, and product affinities.

Leverage advanced analytics to develop personalized campaigns and product strategies.

Strategic Use Cases:

Marketing and Campaign Optimization

Craft campaigns that resonate emotionally by understanding what drives consumer engagement.

Adjust messaging, timing, and channels to align with evolving sentiment trends and seasonal shifts in consumer mood.

Product Development and Innovation

Identify unmet consumer needs and preferences before launching new products.

Refine features, packaging, and pricing strategies based on real-time consumer responses.

Brand Management and Positioning

Monitor brand perceptions to detect early signs of brand fatigue, trust erosion, or negative publicity.

Strengthen brand loyalty by addressing concerns, highlighting strengths, and adapting to changing market contexts.

Competitive Analysis and Market Entry

Benchmark consumer sentiment towards competitors, industry leaders, and emerging disruptors.

Assess market readiness and optimize entry strategies for new regions or segments.

Why Choose Success.ai?

Best Price Guarantee

Access high-quality, verified data at competitive prices, ensuring efficient allocation of your marketing and research budgets.

Seamless Integration

Integrate enriched sentiment data into your analytics, CRM, or marketing platforms via APIs or downloadable formats.

Simplify data management and accelerate decision-making processes.

Data Accuracy with AI Validation

Benefit from AI-driven validation for reliable insights into consumer attitudes, leading to more confident data-driven strategies.

Customizable and Scalable Solutions

Tailor datasets to focus on specific segments, regions, or interests, and scale as your business grows and evolves.

APIs for Enhanced Functionality:

Data Enrichment API

Enhance your existing consumer records with psychographic and sentiment insights, deepening your understanding of audience motivations.

Lead Generation API

Identify audience segments receptive to your messaging, streamlini...
d
US Auto Data | Full VIN | 127,853,223 Vehicle Details | Make Model Year |...
datarade.ai
.json, .csv, .xls
Updated Aug 1, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompCurve (2010). US Auto Data | Full VIN | 127,853,223 Vehicle Details | Make Model Year | Ownership Signals | Consumer Demographics | Automotive Intelligence File [Dataset]. https://datarade.ai/data-products/us-auto-data-full-vin-127-853-223-vehicle-details-make-compcurve
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Aug 1, 2010
Dataset authored and provided by
CompCurve
Area covered
United States
Description
This dataset is a national, VIN-resolved automotive file containing detailed vehicle attributes, ownership signals, and linked consumer demographics. Every row is anchored by a full 17-character VIN, allowing precise matching, decoding, and enrichment across insurance, lending, automotive analytics, marketing, and identity-resolution workflows. The file covers 387M+ U.S. vehicles across all major OEMs, model types, and price tiers.

The dataset includes vehicles from domestic manufacturers (e.g., Ford, GM, Stellantis) as well as foreign/import brands (e.g., Toyota, Honda, BMW, Mercedes, Hyundai, Kia). The manufacturerbased field clearly identifies where the OEM is headquartered, supporting segmentation such as domestic vs foreign, mainstream vs luxury, SUV vs sedan, gas vs hybrid vs electric, and new vs used ownership patterns.

Vehicle & VIN Attribute Coverage

Each record contains core vehicle details:

vin – Full 17-character Vehicle Identification Number

year – Model year

make / model – OEM brand and specific model name

manufacturer / manufacturerbased – Company name and domestic/foreign origin

fuel – Fuel type (gas, diesel, hybrid, EV, flex-fuel)

style – Marketing style (SUV, crossover, coupe, convertible, etc.)

bodytype / bodysubtype – Body classification such as SUV, sedan, pickup, hatchback

class – Market class (mainstream, luxury, premium, truck, etc.)

size – Compact, mid-size, full-size, etc.

doors – Number of doors

vechicletype – Passenger car, light truck, SUV, etc.

enginecylinders – Cylinder count

transmissiontype / transmissiongears – Automatic, manual, CVT, and gear count

gvwrange – Gross Vehicle Weight Rating (light duty vs heavy duty)

weight / maxpayload – Weight/payload estimates

trim – Detailed trim level

msrp – Original MSRP for pricing tiers and value modeling

validated / rankorder – Internal quality indicators

These fields support risk modeling, valuation, depreciation curves, fleet analysis, replacement cycles, and comparisons across domestic and foreign OEMs.

Ownership Signals & Lifecycle Indicators

The dataset includes rich ownership timing and household-level automotive information:

purchasedate – Date the vehicle was obtained, enabling:

Tenure modeling

Trade-in prediction

Lease/loan lifecycle analysis

Service interval modeling

purchasenew – Purchased new vs used

number_of_vehicles_in_hh – Total vehicles linked to the household

validated – Confirmed record flag

These attributes power auto replacement models, refinance targeting, multi-vehicle household insights, and OEM loyalty analytics.

Consumer Identity & Address Standardization

Each VIN record is linked to standardized consumer and household metadata:

consumer_first / consumer_last / consumer_suffix – Owner name fields

consumer_std_address – USPS-style standardized address

consumer_std_city / consumer_std_state / consumer_std_zip – Clean geographic identifiers

consumer_county_name – County for underwriting and geo-risk segmentation

consumer_std_status – Address quality/verification status

consumer_latitude / consumer_longitude – Geocoded coordinates for mapping, heatmaps, and risk scoring

This enables identity resolution, entity matching, household-level modeling, and geographic segmentation.

Consumer Demographics & Economic Indicators

The auto file connects vehicles to extensive demographic and lifestyle fields, including:

consumer_income_range – Household income band

consumer_home_owner – Homeowner vs renter

consumer_home_value – Home value range

consumer_networth – Net worth category

consumer_credit_range – Modeled credit tier

consumer_gender / consumer_age / consumer_age_range – Demographic segment fields

consumer_birth_year – Year-of-birth

consumer_marital_status – Single/married

consumer_presence_of_children / consumer_number_of_children – Household composition

consumer_dwelling_type – Housing type

consumer_length_of_residence / range – Stability indicator

consumer_language, religion, ethnicity – Cultural/language segments

consumer_pool_owner – Lifestyle attribute

consumer_occupation / consumer_education_level – Socioeconomic indicators

consumer_donor / consumer_veteran – Contribution and service attributes

These fields enable hyper-granular segmentation, lifestyle-based modeling, wealth indexing, market analysis, and insurance/lending underwriting.

Phone, Email & Contact Intel

Each record may include up to three phones and three emails:

consumer_phone1/2/3 – Contact numbers

consumer_linetype1/2/3 – Wireless, landline, VOIP

consumer_dnc1/2/3 – Do-Not-Call indicators

consumer_email1/2/3 – Email addresses

This supports compliant outreach, multi-channel activation, CRM enrichment, and identity graph expansion.

Primary Use Cases Insurance & Risk Modeling

VIN decoding, ownership tenure, household economics, and geo data support auto underwriting, pricing, rating territory analysis, and fraud screening.

Auto Finance, Lending & Refinance

Model trade-in window...
c
The Río Hortega University Hospital Glioblastoma dataset: a comprehensive...
cancerimagingarchive.net
csv, dicom, n/a +1
Updated Jun 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2023). The Río Hortega University Hospital Glioblastoma dataset: a comprehensive collection of preoperative, early postoperative and recurrence MRI scans [Dataset]. http://doi.org/10.7937/4545-c905
Explore at:
csv, n/a, dicom, niftiAvailable download formats
Unique identifier
https://doi.org/10.7937/4545-c905
Dataset updated
Jun 9, 2023
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Jun 9, 2023
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description

This data collection consists of multiparametric MRI scans of 40 adult patients with histopathologically confirmed WHO grade 4 astrocytoma, who underwent surgery at the Río Hortega University Hospital in Valladolid, Spain, between January 2018 and December 2022. The dataset encompasses 600 MRI series, covering three time points: preoperative, early post-operative (less than 72 hours after surgery), and the follow-up scan, at which recurrence is diagnosed. Patients included in the sample underwent gross total resection (GTR) or near total resection (NTR), defined as having no residual tumor enhancement and an extent of resection of more than 95% of the initial enhancing volume, respectively. The modified Response Assessment in Neuro-Oncology criteria (RANO) were used to define tumor progression.
The dataset contains T1-weighted (T1w), T2-weighted (T2w), Fluid Attenuated Inversion Recovery (FLAIR), T1w contrast-enhanced (T1ce) sequences, and diffusion-weighted imaging-derived apparent diffusion coefficient (ADC) maps. It also includes clinical and demographic data, IDH status, treatment information, and volumetric assessment of the extent of the resection. Moreover, the dataset comprises expert-validated segmentations of tumor subregions (e.g., enhancing tumor, necrosis, peritumoral region), generated through computer-aided methods from preoperative, postoperative, and follow-up scans.
This dataset is unique in its inclusion of patients who underwent extensive resection of > 95% of the enhancing tumor. It also stands out from other publicly available datasets by providing early postoperative studies and segmentations, filling the gap in preoperative-focused datasets. By making these data publicly available, the scientific community can analyze recurrence patterns in patients who underwent total or near-total resection and develop new registration and segmentation algorithms focused on post-surgical and follow-up studies.
Customer Segmentation Data
kaggle.com
zip
Updated Mar 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Smit Raval (2024). Customer Segmentation Data [Dataset]. https://www.kaggle.com/datasets/ravalsmit/customer-segmentation-data/discussion
Explore at:
zip(1842344 bytes)Available download formats
Dataset updated
Mar 11, 2024
Authors
Smit Raval
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.

Key Features:

Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.

Usage Examples:

Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
Demographic and clinical characteristics of the younger cohort....
figshare.com
plos.figshare.com
xls
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahad Salman; Niels Bergsland; Michael G. Dwyer; Jack A. Reeves; Abhisri Ramesh; Dejan Jakimovski; Bianca Weinstock-Guttman; Robert Zivadinov; Ferdinand Schweser (2025). Demographic and clinical characteristics of the younger cohort. M:F = Male:Female; CIS = Clinically Isolated Syndrome; RMS = Relapsing-Remitting Multiple Sclerosis; EDSS = Expanded Disability Status Scale. [Dataset]. http://doi.org/10.1371/journal.pone.0332478.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0332478.t001
Dataset updated
Nov 14, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Fahad Salman; Niels Bergsland; Michael G. Dwyer; Jack A. Reeves; Abhisri Ramesh; Dejan Jakimovski; Bianca Weinstock-Guttman; Robert Zivadinov; Ferdinand Schweser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Demographic and clinical characteristics of the younger cohort. M:F = Male:Female; CIS = Clinically Isolated Syndrome; RMS = Relapsing-Remitting Multiple Sclerosis; EDSS = Expanded Disability Status Scale.
Segmentation and socio-demographic variables.
plos.figshare.com
xls
Updated Jun 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mauricio Carvache-Franco; Tahani Hassan; Orly Carvache-Franco; Wilmer Carvache-Franco; Olga Martin-Moreno (2023). Segmentation and socio-demographic variables. [Dataset]. http://doi.org/10.1371/journal.pone.0287113.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287113.t004
Dataset updated
Jun 14, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Mauricio Carvache-Franco; Tahani Hassan; Orly Carvache-Franco; Wilmer Carvache-Franco; Olga Martin-Moreno
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Food festivals have been a growing tourism sector in recent years due to their contributions to a region’s economic, marketing, brand, and social growth. This study analyses the demand for the Bahrain food festival. The stated objectives were: i) To identify the motivational dimensions of the demand for the food festival, (ii) To determine the segments of the demand for the food festival, and (iii) To establish the relationship between the demand segments and socio-demographic aspects. The food festival investigated was the Bahrain Food Festival held in Bahrain, located on the east coast of the Persian Gulf. The sample consisted of 380 valid questionnaires and was taken using social networks from those attending the event. The statistical techniques used were factorial analysis and the K-means grouping method. The results show five motivational dimensions: Local food, Art, Entertainment, Socialization, and Escape and novelty. In addition, two segments were found; the first, Entertainment and novelties, is related to attendees who seek to enjoy the festive atmosphere and discover new restaurants. The second is Multiple motives, formed by attendees with several motivations simultaneously. This segment has the highest income and expenses, making it the most important group for developing plans and strategies. The results will contribute to the academic literature and the organizers of food festivals.
Assessing the validity of a data driven segmentation approach: A 4 year...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo (2023). Assessing the validity of a data driven segmentation approach: A 4 year longitudinal study of healthcare utilization and mortality [Dataset]. http://doi.org/10.1371/journal.pone.0195243
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0195243
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Lian Leng Low; Shi Yan; Yu Heng Kwan; Chuen Seng Tan; Julian Thumboo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.
Café Rewards
kaggle.com
zip
Updated Feb 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira (2025). Café Rewards [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/caf-rewards/code
Explore at:
zip(447957 bytes)Available download formats
Dataset updated
Feb 4, 2025
Authors
willian oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset comprises a meticulously structured collection of customer-related information designed for efficient machine learning applications. It consists of three primary folders—offers, customers, and events—each containing valuable data that enable detailed analysis of customer behavior, response to promotional offers, and overall engagement over a 30-day period.

Offers The offers folder contains comprehensive details on various promotional offers that were sent to customers within the 30-day timeframe. Each offer is uniquely identified by an offer_id, which serves as the primary key. Offers are categorized into three distinct types:

BOGO (Buy One, Get One): A customer must purchase a specific product to receive another for free. Discount: A direct discount applied to purchases, incentivizing spending. Informational: Provides details about a promotion without requiring any spending or offering a direct reward. Each offer has specific requirements and rewards:

difficulty: The minimum amount a customer must spend to qualify for the offer. reward: The monetary reward (in USD) received upon successful completion of the offer. duration: The number of days a customer has to complete the offer after receiving it. channels: The marketing channels used to send the offer, which may include email, mobile app notifications, social media, or direct mail. By analyzing the offers dataset, businesses can assess the effectiveness of different promotional strategies and optimize future campaigns.

Customers The customers folder contains demographic information for each member in the dataset. Each customer is uniquely identified using customer_id, which acts as the primary key. The dataset includes the following attributes:

became_member_on: The date (formatted as YYYYMMDD) when the customer created their account. This information helps track customer loyalty and tenure. gender: The customer's gender, categorized as (M)ale, (F)emale, or (O)ther. This allows for demographic segmentation and targeted marketing analysis. age: The customer’s age, useful for analyzing purchasing patterns and offer preferences across different age groups. income: The estimated annual income of the customer (in USD), enabling insights into spending behavior based on economic status. With this dataset, machine learning models can predict customer preferences, segment users into meaningful groups, and tailor offers based on demographic factors.

Events The events folder logs customer activity throughout the 30-day period, capturing interactions with offers and transactions. Each record is associated with a specific customer_id, serving as a foreign key to link activities to individual users. The dataset includes:

event: A categorical description of the customer's interaction. The possible events include:

Transaction: A recorded purchase made by the customer. Offer Received: A notification that an offer was sent to the customer. Offer Viewed: The customer actively opened and engaged with the offer. Offer Completed: The customer fulfilled the necessary conditions to claim the offer's reward. value: A dictionary of values linked to the event, which varies depending on the type of activity:

For transactions, value represents the amount spent by the customer. For offers received, viewed, or completed, value contains the corresponding offer_id. time: A numerical indicator representing the number of hours passed in the 30-day observation window (starting from 0). This allows for tracking customer engagement over time and understanding behavioral trends.

By analyzing the events dataset, businesses can gain insights into customer interactions, measure the success of promotional offers, and identify patterns in spending behavior. Machine learning models can leverage this data to predict which offers will be most effective for different customer segments.

Facebook

Twitter

Click to copy link

Link copied

Cite

Prashob Narendran (2025). Customer_Financial_Data [Dataset]. https://www.kaggle.com/datasets/prashobnarendran/customer-financial-data

Customer_Financial_Data

A comprehensive dataset of bank customer demographics and financial behavior

Explore at:

zip(62099 bytes)Available download formats

Dataset updated

Nov 12, 2025

Authors

Prashob Narendran

Description

Context This dataset contains detailed, anonymized information about a bank's customers. It includes demographic data such as age, income, and family size, as well as financial information like mortgage value, credit card ownership, and average spending habits. The data is well-suited for a variety of machine learning tasks, particularly in the domain of financial services and marketing.

Content The dataset consists of 5000 customer records with 14 attributes:

Customer_ID: A unique identifier for each customer.
Age: The customer's age in completed years.
Years_Experience: Years of professional experience.
Annual_Income: Annual income of the customer (in thousands of dollars).
ZIP_Code: The customer's home address ZIP code.
Family_size: The number of individuals in the customer's family.
Avg_Spending: Average monthly spending on credit cards (in thousands of dollars).
Education_Level: A categorical variable for education level (1: Undergraduate, 2: Graduate, 3: Advanced/Professional).
Mortgage: The value of the customer's house mortgage if any (in thousands of dollars).
Has_Consumer_Loan: Binary variable indicating if the customer accepted a personal loan in the last campaign (1: Yes, 0: No). This is a potential target variable.
Has_Securities_Account: Binary variable indicating if the customer has a securities account with the bank.
Has_CD_Account: Binary variable indicating if the customer has a certificate of deposit (CD) account with the bank.
Uses_Online_Banking: Binary variable indicating if the customer uses online banking services.
Has_CreditCard: Binary variable indicating if the customer uses a credit card issued by this bank.

Data Quality Note Some rows contain negative values for the Years_Experience column. This is a data quality issue that may require preprocessing (e.g., imputation by taking the absolute value or using the average of similar age groups).

Potential Use Cases This dataset is excellent for both educational and practical purposes. You can use it to:

Predict Loan Acceptance: Build a classification model to predict which customers are most likely to accept a personal loan (Has_Consumer_Loan).
Customer Segmentation: Use clustering algorithms (like K-Means) to identify distinct customer segments for targeted marketing campaigns.
Credit Card Adoption: Analyze the factors that influence a customer's decision to get a bank-issued credit card.
Exploratory Data Analysis (EDA): Practice your data analysis and visualization skills to uncover insights about customer behavior.

Clear search

Close search

Google apps

Main menu

Customer_Financial_Data

Predicting Credit Card Customer Segmentation

Predicting Credit Card Customer Segmentation

Exploring Key Customer Characteristics

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

GIS Data | USA & Canada | Over 40k Demographics Variables To Inform Business...

Data from: Customer Segmentation in the Digital Marketing Using a Q-Learning...

Customer Segmentation Data for Marketing Analysis

Demographic Data | USA & Canada | Latest Estimates & Projections To Inform...

US Consumer Demographics | Homeowners & Renters | Email & Mobile Phone |...

Customer Segmentation for Targeted Campaigns

Dataset on Brand Ethics, Trust, Customer Experience, and Loyalty in Latin...

Consumer Behavior and Shopping Habits Dataset:

facebook fact checking dataset

Curated, Segmented, and Deep Learning-Optimized I-SPY 2 MRI Dataset for...

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Data Analysis

Usage Notes

External Resources

Consumer Sentiment Data | Global Audience Insights | Psychographic Profiles...

US Auto Data | Full VIN | 127,853,223 Vehicle Details | Make Model Year |...

The Río Hortega University Hospital Glioblastoma dataset: a comprehensive...

Customer Segmentation Data

Key Features:

Usage Examples:

Demographic and clinical characteristics of the younger cohort....

Segmentation and socio-demographic variables.

Assessing the validity of a data driven segmentation approach: A 4 year...

Café Rewards

Customer_Financial_Data

A comprehensive dataset of bank customer demographics and financial behavior