https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.
Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.
Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains a wealth of customer information collected from within a consumer credit card portfolio, with the aim of helping analysts predict customer attrition. It includes comprehensive demographic details such as age, gender, marital status and income category, as well as insight into each customer’s relationship with the credit card provider such as the card type, number of months on book and inactive periods. Additionally it holds key data about customers’ spending behavior drawing closer to their churn decision such as total revolving balance, credit limit, average open to buy rate and analyzable metrics like total amount of change from quarter 4 to quarter 1, average utilization ratio and Naive Bayes classifier attrition flag (Card category is combined with contacts count in 12months period alongside dependent count plus education level & months inactive). Faced with this set of useful predicted data points across multiple variables capture up-to-date information that can determine long term account stability or an impending departure therefore offering us an equipped understanding when seeking to manage a portfolio or serve individual customers
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be used to analyze the key factors that influence customer attrition. Analysts can use this dataset to understand customer demographics, spending patterns, and relationship with the credit card provider to better predict customer attrition.
- Using the customer demographics, such as gender, marital status, education level and income category to determine which customer demographic is more likely to churn.
- Analyzing the customer’s spending behavior leading up to churning and using this data to better predict the likelihood of a customer of churning in the future.
- Creating a classifier that can predict potential customers who are more susceptible to attrition based on their credit score, credit limit, utilization ratio and other spending behavior metrics over time; this could be used as an early warning system for predicting potential attrition before it happens
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: BankChurners.csv | Column name | Description | |:---------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------| | CLIENTNUM | Unique identifier for each customer. (Integer) | | Attrition_Flag | Flag indicating whether or not the customer has churned out. (Boolean) | | Customer_Age | Age of customer. (Integer) | | Gender | Gender of customer. (String) | | Dependent_count | Number of dependents that customer has. (Integer) | | Education_Level ...
GapMaps GIS data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.
GIS Data attributes include:
Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.
Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.
Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.
Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.
Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.
Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.
Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.
Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain
Primary Use Cases for GapMaps GIS Data:
Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.
Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)
Network Planning
Customer (Risk) Profiling for insurance/loan approvals
Target Marketing
Competitive Analysis
Market Optimization
Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)
Tenant Recruitment
Target Marketing
Market Potential / Gap Analysis
Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)
Customer Profiling
Target Marketing
Market Share Analysis
GapMaps premium demographic data for USA and Canada sourced from Applied Geographic Solutions (AGS) includes an extensive range of the highest quality demographic and lifestyle segmentation products. All databases are derived from superior source data and the most sophisticated, refined, and proven methodologies.
Demographic Data attributes include:
Latest Estimates and Projections The estimates and projections database includes a wide range of core demographic data variables for the current year and 5- year projections, covering five broad topic areas: population, households, income, labor force, and dwellings.
Crime Risk Crime Risk is the result of an extensive analysis of a rolling seven years of FBI crime statistics. Based on detailed modeling of the relationships between crime and demographics, Crime Risk provides an accurate view of the relative risk of specific crime types (personal, property and total) at the block and block group level.
Panorama Segmentation AGS has created a segmentation system for the United States called Panorama. Panorama has been coded with the MRI Survey data to bring you Consumer Behavior profiles associated with this segmentation system.
Business Counts Business Counts is a geographic summary database of business establishments, employment, occupation and retail sales.
Non-Resident Population The AGS non-resident population estimates utilize a wide range of data sources to model the factors which drive tourists to particular locations, and to match that demand with the supply of available accommodations.
Consumer Expenditures AGS provides current year and 5-year projected expenditures for over 390 individual categories that collectively cover almost 95% of household spending.
Retail Potential This tabulation utilizes the Census of Retail Trade tables which cross-tabulate store type by merchandise line.
Environmental Risk The environmental suite of data consists of several separate database components including: -Weather Risks -Seismological Risks -Wildfire Risk -Climate -Air Quality -Elevation and terrain
Primary Use Cases for AGS Demographic Data:
Integrate AGS demographic data with your existing GIS or BI platform to generate powerful visualizations.
Finance / Insurance (eg. Hedge Funds, Investment Advisors, Investment Research, REITs, Private Equity, VC)
Network Planning
Customer (Risk) Profiling for insurance/loan approvals
Target Marketing
Competitive Analysis
Market Optimization
Commercial Real-Estate (Brokers, Developers, Investors, Single & Multi-tenant O/O)
Tenant Recruitment
Target Marketing
Market Potential / Gap Analysis
Marketing / Advertising (Billboards/OOH, Marketing Agencies, Indoor Screens)
Customer Profiling
Target Marketing
Market Share Analysis
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed customer information, including demographics, purchase history, insurance details, and preferences. With over 53,000 entries, it is ideal for analyzing customer behavior, performing segmentation, and building predictive models for targeted marketing or service optimization.
Key Features: 20 columns covering customer demographics, income levels, policy details, and behavioral data. Includes segmentation groups for analysis. Suitable for classification, clustering, and pattern recognition tasks.
Inspiration: Perform clustering to segment customers into distinct groups. Build predictive models for customer churn or purchasing behavior. Analyze the correlation between demographic details and purchasing preferences.
Columns and Descriptions: 1. Customer ID: Unique identifier for each customer. 2. Age: Age of the customer. 3. Gender: Gender of the customer (Male/Female). 4. Marital Status: Marital status (Married, Single, Divorced, etc.). 5. Education Level: Customer's education level (e.g., Associate Degree, Doctorate). 6. Geographic Information: Location or state of residence. 7. Occupation: Customer's occupation (e.g., Manager, Entrepreneur). 8. Income Level: Annual income of the customer in monetary units. 9. Behavioral Data: Information on customer behavior (e.g., policy usage). 10. Purchase History: Date of the last purchase. 11. Interactions with Customer Service: Mode of interaction (e.g., Phone, Email). 12. Insurance Products Owned: Types of insurance products owned. 13. Coverage Amount: Total insurance coverage amount. 14. Premium Amount: Premium amount paid by the customer. 15. Policy Type: Type of policy (Group, Family). 16. Customer Preferences: Communication preferences (e.g., Email, Text). 17. Preferred Communication Channel: Channel preference for communication. 18. Preferred Contact Time: Time preference for contact (e.g., Morning, Evening). 19. Preferred Language: Customer's preferred language. 20. Segmentation Group: Group the customer belongs to based on segmentation (e.g., Segment1, Segment2).
Knowing who your consumers are is essential for businesses, marketers, and researchers. This detailed demographic file offers an in-depth look at American consumers, packed with insights about personal details, household information, financial status, and lifestyle choices. Let's take a closer look at the data:
Personal Identifiers and Basic Demographics At the heart of this dataset are the key details that make up a consumer profile:
Unique IDs (PID, HHID) for individuals and households Full names (First, Middle, Last) and suffixes Gender and age Date of birth Complete location details (address, city, state, ZIP) These identifiers are critical for accurate marketing and form the base for deeper analysis.
Geospatial Intelligence This file goes beyond just listing addresses by including rich geospatial data like:
Latitude and longitude Census tract and block details Codes for Metropolitan Statistical Areas (MSA) and Core-Based Statistical Areas (CBSA) County size codes Geocoding accuracy This allows for precise geographic segmentation and localized marketing.
Housing and Property Data The dataset covers a lot of ground when it comes to housing, providing valuable insights for real estate professionals, lenders, and home service providers:
Homeownership status Dwelling type (single-family, multi-family, etc.) Property values (market, assessed, and appraised) Year built and square footage Room count, amenities like fireplaces or pools, and building quality This data is crucial for targeting homeowners with products and services like refinancing or home improvement offers.
Wealth and Financial Data For a deeper dive into consumer wealth, the file includes:
Estimated household income Wealth scores Credit card usage Mortgage info (loan amounts, rates, terms) Home equity estimates and investment property ownership These indicators are invaluable for financial services, luxury brands, and fundraising organizations looking to reach affluent individuals.
Lifestyle and Interests One of the most useful features of the dataset is its extensive lifestyle segmentation:
Hobbies and interests (e.g., gardening, travel, sports) Book preferences, magazine subscriptions Outdoor activities (camping, fishing, hunting) Pet ownership, tech usage, political views, and religious affiliations This data is perfect for crafting personalized marketing campaigns and developing products that align with specific consumer preferences.
Consumer Behavior and Purchase Habits The file also sheds light on how consumers behave and shop:
Online and catalog shopping preferences Gift-giving tendencies, presence of children, vehicle ownership Media consumption (TV, radio, internet) Retailers and e-commerce businesses will find this behavioral data especially useful for tailoring their outreach.
Demographic Clusters and Segmentation Pre-built segments like:
Household, neighborhood, family, and digital clusters Generational and lifestage groups make it easier to quickly target specific demographics, streamlining the process for market analysis and campaign planning.
Ethnicity and Language Preferences In today's multicultural market, knowing your audience's cultural background is key. The file includes:
Ethnicity codes and language preferences Flags for Hispanic/Spanish-speaking households This helps ensure culturally relevant and sensitive communication.
Education and Occupation Data The dataset also tracks education and career info:
Education level and occupation codes Home-based business indicators This data is essential for B2B marketers, recruitment agencies, and education-focused campaigns.
Digital and Social Media Habits With everyone online, digital behavior insights are a must:
Internet, TV, radio, and magazine usage Social media platform engagement (Facebook, Instagram, LinkedIn) Streaming subscriptions (Netflix, Hulu) This data helps marketers, app developers, and social media managers connect with their audience in the digital space.
Political and Charitable Tendencies For political campaigns or non-profits, this dataset offers:
Political affiliations and outlook Charitable donation history Volunteer activities These insights are perfect for cause-related marketing and targeted political outreach.
Neighborhood Characteristics By incorporating census data, the file provides a bigger picture of the consumer's environment:
Population density, racial composition, and age distribution Housing occupancy and ownership rates This offers important context for understanding the demographic landscape.
Predictive Consumer Indexes The dataset includes forward-looking indicators in categories like:
Fashion, automotive, and beauty products Health, home decor, pet products, sports, and travel These predictive insights help businesses anticipate consumer trends and needs.
Contact Information Finally, the file includes ke...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was collected from [Kaggle](https://www.kaggle.com/code/fabiendaniel/customer-segmentation). It includes various features related to customer demographics, purchasing behavior, and other relevant metrics.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Consumer Behavior and Shopping Habits Dataset is a tabular collection of customer demographics, purchase history, product preferences, shopping frequency, and online and offline purchasing behavior.
2) Data Utilization (1) Consumer Behavior and Shopping Habits Dataset has characteristics that: • Each row contains detailed consumer and transaction information such as customer ID, age, gender, purchased goods and categories, purchase amount, region, product attributes (size, color, season), review rating, subscription status, delivery method, discount/promotion usage, payment method, purchase frequency, etc. • Data is organized to cover a variety of variables and purchasing patterns to help segment customers, establish marketing strategies, analyze product preferences, and more. (2) Consumer Behavior and Shopping Habits Dataset can be used to: • Customer Segmentation and Target Marketing: You can analyze demographics and purchasing patterns to define different customer groups and use them to develop customized marketing strategies. • Product and service improvement: Based on purchase history, review ratings, discount/promotional responses, etc., it can be applied to product and service improvements such as identifying popular products, managing inventory, and analyzing promotion effects.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides detailed information about customer orders and spending patterns. It includes demographic details such as age, gender, marital status, and occupation, along with geographic data such as state and zone. The dataset is ideal for exploratory data analysis, customer segmentation, and market analysis.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
MRI-based artificial intelligence (AI) research on patients with brain gliomas has been rapidly increasing in popularity in recent years in part due to a growing number of publicly available MRI datasets. Notable examples include The Cancer Genome Atlas Glioblastoma dataset (TCGA-GBM) consisting of 262 subjects and the International Brain Tumor Segmentation (BraTS) challenge dataset consisting of 542 subjects (including 243 preoperative cases from TCGA-GBM). The public availability of these glioma MRI datasets has fostered the growth of numerous emerging AI techniques including automated tumor segmentation, radiogenomics, and MRI-based survival prediction. Despite these advances, existing publicly available glioma MRI datasets have been largely limited to only 4 MRI contrasts (T2, T2/FLAIR, and T1 pre- and post-contrast) and imaging protocols vary significantly in terms of magnetic field strength and acquisition parameters. Here we present the University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset. The UCSF-PDGM dataset includes 501 subjects with histopathologically-proven diffuse gliomas who were imaged with a standardized 3 Tesla preoperative brain tumor MRI protocol featuring predominantly 3D imaging, as well as advanced diffusion and perfusion imaging techniques. The dataset also includes isocitrate dehydrogenase (IDH) mutation status for all cases and O[6]-methylguanine-DNA methyltransferase (MGMT) promotor methylation status for World Health Organization (WHO) grade III and IV gliomas. The UCSF-PDGM has been made publicly available in the hopes that researchers around the world will use these data to continue to push the boundaries of AI applications for diffuse gliomas.
Data collection was performed in accordance with relevant guidelines and regulations and was approved by the University of California San Francisco institutional review board with a waiver for consent. The dataset population consisted of 501* adult patients with histopathologically confirmed grade II-IV diffuse gliomas who underwent preoperative MRI, initial tumor resection, and tumor genetic testing at a single medical center between 2015 and 2021. Patients with any prior history of brain tumor treatment were excluded; however, history of tumor biopsy was not considered an exclusion criterion.
All subjects’ tumors were tested for IDH mutations by genetic sequencing of tissue acquired during biopsy or resection. All grade III and IV tumors were tested for MGMT methylation status using a methylation sensitive quantitative PCR assay.
The 501* cases included in the UCSF-PDGM include 55 (11%) grade II, 42 (9%) grade III, and 403 (80%) grade IV tumors. There was a male predominance for all tumor grades (56%, 60%, and 60%, respectively for grades II-IV). IDH mutations were identified in a majority of grade II (83%) and grade III (67%) tumors and a small minority of grade IV tumors (8%). MGMT promoter hypermethylation was detected in 63% of grade IV gliomas and was not tested for in a majority of lower grade gliomas. 1p/19q codeletion was detected in 20% of grade II tumors and a small minority of grade III (5%) and IV (<1%) tumors. Tabulated details and glossary are available in the Data Access and Detailed Description tabs below.
All preoperative MRI was performed on a 3.0 tesla scanner (Discovery 750, GE Healthcare, Waukesha, Wisconsin, USA) and a dedicated 8-channel head coil (Invivo, Gainesville, Florida, USA). The imaging protocol included 3D T2-weighted, T2/FLAIR-weighted, susceptibility-weighted (SWI), diffusion-weighted (DWI), pre- and post-contrast T1-weighted images, 3D arterial spin labeling (ASL) perfusion images, and 2D 55-direction high angular resolution diffusion imaging (HARDI). Over the study period, two gadolinium-based contrast agents were used: gadobutrol (Gadovist, Bayer, LOC) at a dose of 0.1 mL/kg and gadoterate (Dotarem, Guerbet, Aulnay-sous-Bois, France) at a dose of 0.2 mL/kg.
HARDI data were eddy current corrected and processed using the Eddy and DTIFIT modules from FSL 6.0.2 yielding isotropic diffusion weighted images (DWI) and several quantitative diffusivity maps: mean diffusivity (MD), axial diffusivity (AD), radial diffusivity (RD), and fractional anisotropy (FA). Eddy correction was performed with outlier replacement on and topup correction off. DTIFIT was performed with simple least squares regression. Each image contrast was registered and resampled to the 3D space defined by the T2/FLAIR image (1 mm isotropic resolution) using automated non-linear registration (Advanced Normalization Tools). Resampled co-registered data were then skull stripped using a previously described and publicly available deep-learning algorithm: https://www.github.com/ecalabr/brain_mask/.
Multicompartment tumor segmentation of study data was undertaken as part of the 2021 BraTS challenge. Briefly, image data first underwent automated segmentation using an ensemble model consisting of prior BraTS challenge winning segmentation algorithms. Images were then manually corrected by trained radiologists and approved by 2 expert reviewers. Segmentation included three major tumor compartments: enhancing tumor, non-enhancing/necrotic tumor, and surrounding FLAIR abnormality (sometimes referred to as edema).
The UCSF-PDGM adds to on an existing body of publicly available diffuse glioma MRI datasets that are commonly used in AI research applications. As MRI-based AI research applications continue to grow, new data are needed to foster development of new techniques and increase the generalizability of existing algorithms. The UCSF-PDGM not only significantly increases the total number of publicly available diffuse glioma MRI cases, but also provides a unique contribution in terms of MRI technique. The inclusion of 3D sequences and advanced MRI techniques like ASL and HARDI provides a new opportunity for researchers to explore the potential utility of cutting-edge clinical diagnostics for AI applications. In addition, these advanced imaging techniques may prove useful for radiogenomic studies focused on identification of IDH mutations or MGMT promoter methylation.
The UCSF-PDGM dataset, particularly when combined with existing publicly available datasets, has the potential to fuel the next phase of radiologic AI research on diffuse gliomas. However, the UCSF-PDGM dataset’s potential will only be realized if the radiology AI research community takes advantage of this new data resource. We hope that this dataset sparks inspiration in the next generation of AI researchers, and we look forward to the new techniques and discoveries that the UCSF-PDGM will generate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewThe BuzzFeed dataset, officially known as the BuzzFeed-Webis Fake News Corpus 2016, comprises content from 9 news publishers over a 7-day period close to the 2016 US election. It was created to analyze the spread of misinformation and hyperpartisan content on social media platforms, particularly Facebook.Dataset CompositionNews Articles: The dataset includes 1,627 articles from various sources:826 from mainstream publishers256 from left-wing publishers545 from right-wing publishersFacebook Posts: Each article is associated with Facebook post data, including metrics like share counts, reaction counts, and comment counts.Comments: The dataset includes nearly 1.7 million Facebook comments discussing the news content.Fact-Check Ratings: Each article was fact-checked by professional journalists at BuzzFeed, providing veracity assessments.Key FeaturesPublisher Information: The dataset covers 9 publishers, including 6 hyperpartisan (3 left-wing and 3 right-wing) and 3 mainstream outlets.Temporal Aspect: The data was collected over seven weekdays (September 19-23 and September 26-27, 2016).Verification Status: All publishers included in the dataset had earned Facebook's blue checkmark, indicating authenticity and elevated status.Metadata: Includes various metrics such as publication dates, post types, and engagement statistics.Potential ApplicationsThe BuzzFeed dataset is valuable for various research and analytical purposes:News Veracity Assessment: Researchers can use machine learning techniques to classify articles based on their factual accuracy.Social Media Analysis: The dataset allows for studying how news spreads on platforms like Facebook, including engagement patterns.Hyperpartisan Content Study: It enables analysis of differences between mainstream and hyperpartisan news sources.Content Strategy Optimization: Media companies can use insights from the dataset to refine their content strategies.Audience Analysis: The data can be used for demographic analysis and audience segmentation.This dataset provides a comprehensive snapshot of news dissemination and engagement on social media during a crucial period, making it a valuable resource for researchers, data scientists, and media analysts studying online information ecosystems.
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset includes MR imaging from 203 glioma patients with 617 different post-treatment MR time points, and tumor segmentations. Clinical data includes patient demographics, genomics, and treatment details. Preprocessing of MR images followed a standardized pipeline with automatic tumor segmentation based on nnUNet deep learning approach. The automatic tumor segmentations were manually validated and refined by neuroradiologists.
The heterogeneity of glioma imaging characteristics and management strategies contributes to a lack of reliable findings when evaluating treatment outcomes with conventional MRI, and the overlapping imaging features of radiation necrosis and tumor progression post-treatment can be particularly challenging for radiologists. This robust dataset should contribute to the development of AI models to improve evaluation of treatment outcomes.
The dataset consists of institutional review board-approved retrospective analysis of pathologically proven glioma patients at University Hospital of The University of Missouri - Anatomic Pathology CoPathPlus database was used to collect glioma cases over the last 10 years.
Sharing segmented postoperative glioma data with clinical information significantly accelerates research and improves clinical practice by providing a comprehensive, readily available dataset. This eliminates the time-consuming burden of manual segmentation, enhances the accuracy and consistency of tumor delineation, and allows researchers to focus on analysis and interpretation, ultimately driving the development of more accurate segmentation algorithms, predictive models for personalized treatment strategies, and improved patient outcome predictions. Standardized longitudinal follow-up and benchmarking capabilities further facilitate multi-center studies and objective evaluation of treatment efficacy, leading to advancements in glioma biology and personalized patient care.
The following subsections provide information about how the data were selected, acquired, and prepared for publication.
The selection criteria for the CoPath Natural Language II Search included accession dates ranging from 01/01/2021 to 02/20/2024. To ensure all relevant diagnoses for this study were included; three separate keyword searches were performed using "glioma", "astrocytoma", and "glioblastoma". The search only included keyword results that were present in the Final Diagnoses. "Glioma" returned 85 cases; "Astrocytoma" returned 67 cases; and "Glioblastoma" returned 215 cases. Following the exclusion of duplicate cases, those missing any of the four requisite MR imaging sequences, and cases that failed processing through our pipeline, our final cohort comprised 203 patients.
Radiology: MRI studies on our McKesson Radiology 12.2 Picture archiving and communication system (PACS) (Change Healthcare Radiology Solutions, Nashville, Tennessee, U.S) were exported. The image exportation process involved multiple personnels of varying ranks, including medical graduates, radiology residents, neuroradiology fellows, and neuroradiologists. Our team exported the four basic conventional MR sequences including T1, T1 with IV gadolinium-based contrast agent administration, T2, and Fluid Attenuated Inversion Recovery (FLAIR) into a HIPPA compliant MU secured research server.
For each patient, the images were thoroughly checked for including up to six post-treatment images as available. The post-treatment images were captured on different dates, though not all patients had the maximum number of follow-up images; some had as few as one post-treatment follow-up MRI. For patients with more frequent follow-up MRIs, the immediate post-operative scan, at least one time point of progression and another follow-up study. The MR images were comprehensively reviewed to exclude significantly motion degraded or suboptimal studies.
The majority of the studies were conducted using Siemens MRI machines 97.47%, n=579 with a smaller proportion performed on MRI machines from other vendors: GE (2.02%, n=12) and Philips (0.51%, n=3). Table 1 shows the distribution of studies across different Siemens MR machines. Regarding the magnetic field strength, 1.5T MRIs accounted for 48.14% (n=1,126), 3T MRIs accounted for 45.08% (n=318), and 3T MRIs accounted for 45.08% (n=261). Table 2 summarizes the MRI parameters of each MR sequence.
Our team made efforts to obtain 3D sequences whenever available. Scans were performed using 3D acquisition methods in 40.28% of cases (n=975) and 2D acquisition methods in 59.82% of cases (n=1,419). In cases where 3D images were not available, 2D images were utilized instead. Table 3 summarizes the counts and percentage of studies performed with 2D vs 3D acquisition across different MR sequences.
Clinical: Basic demographic data, clinical data points, and tumor pathology were obtained through review of the electronic medical record (EMR). Clinical data points included the date of diagnosis, date of first surgery or treatment, date and characterization of first and/or subsequent disease progression and/or recurrence, and date of any follow-up resections. Survival information included the date of death and, if that was unknown, the date of last known contact while alive. Disease progression and/or recurrence was characterized as imaging only, clinical only, or both based on information obtained through review of each patient’s clinical notes, brain imaging, and clinical impression as documented by the primary care team. Brief summaries of the reasoning behind each characterization were also included. Patients with no further clinical contact beyond their primary treatment were documented as “lost to follow-up.” Pathological information was obtained through review of the initial pathology note and any subsequent addenda for each tumor sample and included final tumor diagnosis, grade, and any identified genetic mutations. This information was then compiled into a spreadsheet for analysis.
The image data underwent preprocessing using the Federated Tumor Segmentation (FeTS) tool. The pipeline began with converting DICOM files to the Neuroimaging Informatics Technology Initiative (NIfTI) format, ensuring the removal of any remaining PHI not eliminated by the anonymization/de-identification tool. The converted NIfTI images were then resampled to an isotropic 1mm³ resolution and co-registered to the standard anatomical human brain atlas, SRI24. A deep learning brain extraction method was applied to strip the skull and extracranial tissues, thereby mitigating any potential facial reconstruction or recognition risks.
The preprocessed images were segmented using a deep network based on nnU-Net, resulting in four distinct labels that correspond to different components of each tumor:
A spreadsheet is also provided that includes tumor volumes and signal intensity of different tumor components across various MR sequences.
Each scan was manually exported using the built-in McKesson DICOM export tool into separate folders labeled as post-treatment 1, post-treatment 2, etc. In a subsequent step, a subset of the data was selected to contribute for the development of FeTS 2 toolbox. Consequently, the naming convention was updated to replace "post-treatment" with "timepoint" (e.g., post-treatment 1 became timepoint 1) to adhere to the instructions of the FeTS development team. Each sequence was saved in its own folder within these categories to a HIPPA compliant and secured server within the University of Missouri network. Exportation was conducted in DICOM format, maintaining the original image compression settings to preserve quality. To ensure patient privacy and HIPPA compliance, all images were anonymized and all protected health information (PHI) e.g. patient name, MRN, accession number, etc. were deleted from the metadata DICOM headers.
The folders are labeled in the following structure:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Food festivals have been a growing tourism sector in recent years due to their contributions to a region’s economic, marketing, brand, and social growth. This study analyses the demand for the Bahrain food festival. The stated objectives were: i) To identify the motivational dimensions of the demand for the food festival, (ii) To determine the segments of the demand for the food festival, and (iii) To establish the relationship between the demand segments and socio-demographic aspects. The food festival investigated was the Bahrain Food Festival held in Bahrain, located on the east coast of the Persian Gulf. The sample consisted of 380 valid questionnaires and was taken using social networks from those attending the event. The statistical techniques used were factorial analysis and the K-means grouping method. The results show five motivational dimensions: Local food, Art, Entertainment, Socialization, and Escape and novelty. In addition, two segments were found; the first, Entertainment and novelties, is related to attendees who seek to enjoy the festive atmosphere and discover new restaurants. The second is Multiple motives, formed by attendees with several motivations simultaneously. This segment has the highest income and expenses, making it the most important group for developing plans and strategies. The results will contribute to the academic literature and the organizers of food festivals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive customer data suitable for segmentation analysis. It includes anonymized demographic, transactional, and behavioral attributes, allowing for detailed exploration of customer segments. Leveraging this dataset, marketers, data scientists, and business analysts can uncover valuable insights to optimize targeted marketing strategies and enhance customer engagement. Whether you're looking to understand customer behavior or improve campaign effectiveness, this dataset offers a rich resource for actionable insights and informed decision-making.
Anonymized demographic, transactional, and behavioral data. Suitable for customer segmentation analysis. Opportunities to optimize targeted marketing strategies. Valuable insights for improving campaign effectiveness. Ideal for marketers, data scientists, and business analysts.
Segmenting customers based on demographic attributes. Analyzing purchase behavior to identify high-value customer segments. Optimizing marketing campaigns for targeted engagement. Understanding customer preferences and tailoring product offerings accordingly. Evaluating the effectiveness of marketing strategies and iterating for improvement. Explore this dataset to unlock actionable insights and drive success in your marketing initiatives!