Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was collected from Kaggle. It includes various features related to customer demographics, purchasing behavior, and other relevant metrics.
Facebook
TwitterDRAKO is a Mobile Location Audience Targeting provider with a programmatic trading desk specialising in geolocation analytics and programmatic advertising. Through our customised approach, we offer business and consumer insights as well as addressable audiences for advertising.
Mobile Location Data can be meaningfully transformed into Audience Targeting when used in conjunction with other dataset. Our expansive POI Data allows us to segment users by visitation to major brands and retailers as well as categorizes them into syndicated segments. Beyond POI visits, our proprietary Home Location Model determines residents of geographic areas such as Designated Market Areas, Counties, or States. Relatedly, our Home Location Model also fuels our Geodemographic Census Data segments as we are able to determine residents of the smallest census units. Additionally, we also have audiences of: ticketed event and venue visitors; survey data; and retail data.
All of our Audience Targeting is 100% deterministic in that it only includes high-quality, real visits to locations as defined by a POIs satellite imagery buildings contour. We never use a radius when building an audience unless requested. We have a horizontal accuracy of 5m.
Additionally, we can always cross reference your audience targeting with our syndicated segments:
Overview of our Syndicated Audience Data Segments: - Brand/POI segments (specific named stores and locations) - Categories (behavioural segments - revealed habits) - Census demographic segments (HH income, race, religion, age, family structure, language, etc.,) - Events segments (ticketed live events, conferences, and seminars) - Resident segments (State/province, CMAs, DMAs, city, county, sub-county) - Political segments (Canadian Federal and Provincial, US Congressional Upper and Lower House, US States, City elections, etc.,) - Survey Data (Psychosocial/Demographic survey data) - Retail Data (Receipt/transaction data)
All of our syndicated segments are customizable. That means you can limit them to people within a certain geography, remove employees, include only the most frequent visitors, define your own custom lookback, or extend our audiences using our Home, Work, and Social Extensions.
In addition to our syndicated segments, we’re also able to run custom queries return to you all the Mobile Ad IDs (MAIDs) seen at in a specific location (address; latitude and longitude; or WKT84 Polygon) or in your defined geographic area of interest (political districts, DMAs, Zip Codes, etc.,)
Beyond just returning all the MAIDs seen within a geofence, we are also able to offer additional customizable advantages: - Average precision between 5 and 15 meters - CRM list activation + extension - Extend beyond Mobile Location Data (MAIDs) with our device graph - Filter by frequency of visitations - Home and Work targeting (retrieve only employees or residents of an address) - Home extensions (devices that reside in the same dwelling from your seed geofence) - Rooftop level address geofencing precision (no radius used EVER unless user specified) - Social extensions (devices in the same social circle as users in your seed geofence) - Turn analytics into addressable audiences - Work extensions (coworkers of users in your seed geofence)
Data Compliance: All of our Audience Targeting Data is fully CCPA compliant and 100% sourced from SDKs (Software Development Kits), the most reliable and consistent mobile data stream with end user consent available with only a 4-5 day delay. This means that our location and device ID data comes from partnerships with over 1,500+ mobile apps. This data comes with an associated location which is how we are able to segment using geofences.
Data Quality: In addition to partnering with trusted SDKs, DRAKO has additional screening methods to ensure that our mobile location data is consistent and reliable. This includes data harmonization and quality scoring from all of our partners in order to disregard MAIDs with a low quality score.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Overview: Customer Segmentation Using K-Means Clustering
Introduction In this project, I analysed customer data from a retail store to identify distinct customer segments. The dataset includes key attributes such as age, city, and total sales of the customers. By leveraging K-Means clustering, an unsupervised machine learning technique, I aim to group customers based on their age and sales metrics. These insights will enable the creation of targeted marketing campaigns tailored to the specific needs and behaviours of each customer segment.
Objectives - Cluster Customers: Use K-Means clustering to group customers based on age and total sales. - Analyse Segments: Examine the characteristics of each customer segment. - Targeted Marketing: Develop strategies for personalized marketing campaigns targeting each identified customer group.
Data Description The dataset comprises:
Methodology - Data Preprocessing: Clean and preprocess the data to handle any missing or inconsistent entries. - Feature Selection: Focus on age and total sales as primary features for clustering. - K-Means Clustering: Apply the K-Means algorithm to identify distinct customer segments. - Cluster Analysis: Analyse the resulting clusters to understand the demographic and sales characteristics of each group. - Marketing Strategy Development: Create targeted marketing strategies for each customer segment to enhance engagement and sales.
Expected Outcomes - Customer Segments: Clear identification of customer groups based on age and purchasing behaviour. - Insights for Marketing: Detailed understanding of each segment to inform targeted marketing efforts. - Business Impact: Enhanced ability to tailor marketing campaigns, potentially leading to increased customer satisfaction and sales.
By clustering customers based on age and total sales, this project aims to provide actionable insights for personalized marketing, ultimately driving better customer engagement and higher sales for the retail store.
Facebook
TwitterAI in Consumer Decision-Making: Global Zero-Party Dataset
This dataset captures how consumers around the world are using AI tools like ChatGPT, Perplexity, Gemini, Claude, and Copilot to guide their purchase decisions. It spans multiple product categories, demographics, and geographies, mapping the emerging role of AI as a decision-making companion across the consumer journey.
What Makes This Dataset Unique
Unlike datasets inferred from digital traces or modeled from third-party assumptions, this collection is built entirely on zero-party data: direct responses from consumers who voluntarily share their habits and preferences. That means the insights come straight from the people making the purchases, ensuring unmatched accuracy and relevance.
For FMCG leaders, retailers, and financial services strategists, this dataset provides the missing piece: visibility into how often consumers are letting AI shape their decisions, and where that influence is strongest.
Dataset Structure
Each record is enriched with: Product Category – from high-consideration items like electronics to daily staples such as groceries and snacks. AI Tool Used – identifying whether consumers turn to ChatGPT, Gemini, Perplexity, Claude, or Copilot. Influence Level – the percentage of consumers in a given context who rely on AI to guide their choices. Demographics – generational breakdowns from Gen Z through Boomers. Geographic Detail – city- and country-level coverage across Africa, LATAM, Asia, Europe, and North America.
This structure allows filtering and comparison across categories, age groups, and markets, giving users a multidimensional view of AI’s impact on purchasing.
Why It Matters
AI has become a trusted voice in consumers’ daily lives. From meal planning to product comparisons, many people now consult AI before making a purchase—often without realizing how much it shapes the options they consider. For brands, this means that the path to purchase increasingly runs through an AI filter.
This dataset provides a comprehensive view of that hidden step in the consumer journey, enabling decision-makers to quantify: How much AI shapes consumer thinking before they even reach the shelf or checkout. Which product categories are most influenced by AI consultation. How adoption varies by geography and generation. Which AI platforms are most commonly trusted by consumers.
Opportunities for Business Leaders
FMCG & Retail Brands: Understand where AI-driven decision-making is already reshaping category competition. Marketers: Identify demographic segments most likely to consult AI, enabling targeted strategies. Retailers: Align assortments and promotions with the purchase patterns influenced by AI queries. Investors & Innovators: Gauge market readiness for AI-integrated commerce solutions.
The dataset doesn’t just describe what’s happening—it opens doors to the “so what” questions that define strategy. Which categories are becoming algorithm-driven? Which markets are shifting fastest? Where is the opportunity to get ahead of competitors in an AI-shaped funnel?
Why Now
Consumer AI adoption is no longer a forecast; it is a daily behavior. Just as search engines once rewrote the rules of marketing, conversational AI is quietly rewriting how consumers decide what to buy. This dataset offers an early, detailed view into that change, giving brands the ability to act while competitors are still guessing.
What You Get
Users gain: A global, city-level view of AI adoption in consumer decision-making. Cross-category comparability to see where AI influence is strongest and weakest. Generational breakdowns that show how adoption differs between younger and older cohorts. AI platform analysis, highlighting how tool preferences vary by region and category. Every row is powered by zero-party input, ensuring the insights reflect actual consumer behavior—not modeled assumptions.
How It’s Used
Leverage this data to:
Validate strategies before entering new markets or categories. Benchmark competitors on AI readiness and influence. Identify growth opportunities in categories where AI-driven recommendations are rapidly shaping decisions. Anticipate risks where brand visibility could be disrupted by algorithmic mediation.
Core Insights
The full dataset reveals: Surprising adoption curves across categories where AI wasn’t expected to play a role. Geographic pockets where AI has already become a standard step in purchase decisions. Demographic contrasts showing who trusts AI most—and where skepticism still holds. Clear differences between AI platforms and the consumer profiles most drawn to each.
These patterns are not visible in traditional retail data, sales reports, or survey summaries. They are only captured here, directly from the consumers themselves.
Summary
Winning in FMCG and retail today means more than getting on shelves, capturing price points, or running promotions. It means understanding the invisible algorithms consumers are ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The TSABC analysis assumes the 1KGP demographic model in each population.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.
Facebook
TwitterHere's a step-by-step guide on how to approach user segmentation for FitTrackr:
Define your segmentation goals: Start by determining what you want to achieve with user segmentation. For example, you might want to identify the most engaged users, understand the demographics of your user base, or target specific user groups with personalized promotions.
Gather data: Collect relevant data about your app users. This can include demographic information (age, gender, location), app usage data (frequency of app usage, time spent on different features), user behavior (types of workouts, goals set, achievements unlocked), and any other relevant data points available to you.
Identify relevant segmentation variables: Based on the goals you defined, identify the key variables that will help you segment your user base effectively. For FitTrackr, potential variables could include age, gender, fitness goals (e.g., weight loss, muscle gain), workout preferences (e.g., cardio, strength training), and user engagement level.
Segment the user base: Use clustering techniques or segmentation algorithms to divide your user base into distinct segments based on the identified variables. You can employ methods such as k-means clustering, hierarchical clustering, or even machine learning algorithms like decision trees or random forests.
Analyze and profile each segment: Once the segmentation is done, analyze each segment to understand their characteristics, preferences, and needs. Create detailed user profiles for each segment, including demographic information, app usage patterns, fitness goals, and any other relevant attributes. This will help you tailor your marketing messages and app features to each segment's specific requirements.
Develop targeted strategies: Based on the insights gained from user profiles, develop targeted marketing strategies and app features for each segment. For example, if you have a segment of users who primarily focus on weight loss, you might create personalized workout plans or send them motivational content related to weight management.
Implement and evaluate: Implement the targeted strategies and monitor their effectiveness. Continuously evaluate and refine your segmentation approach based on user feedback, engagement metrics, and the achievement of your goals.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3-way IBD results are from [38] for Model C and [31] for Model EA. Relate and TSABC results are obtained from 25 simulated datasets under each model, with genomes consisting of 30 chromosomes each of length ℓ. The TSABC simulations included sequencing error and gene conversion with the same settings as [38] for Model C and [31] for Model EA. Relate performed poorly on those datasets and the reported results are for datasets simulated without sequencing error or gene conversion. For TSABC, (SE 0.03) units of 10−4 for Model C, and 1.03 (SE 0.03) for Model EA (true value 1).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Overview The Mall Customers Dataset provides data on 200 individuals who visit a mall, including demographic information, annual income, and spending habits. This dataset is useful for exploratory data analysis, customer segmentation, and clustering tasks (e.g., K-means clustering).
Dataset Summary - Rows: 200 - Columns: 5 - No missing values
Columns Description - CustomerID: A unique identifier for each customer (integer). - Genre: The gender of the customer (Male/Female). - Age: The age of the customer (integer). - Annual Income (k$): Annual income of the customer in thousands of dollars (integer). - Spending Score (1-100): A score assigned by the mall based on customer behavior and spending patterns (integer).
Potential Use Cases - Customer Segmentation: Group customers based on their income and spending habits. - Behavioral Analysis: Explore how factors like gender, age, and income influence spending scores. - Clustering: Apply algorithms such as K-means to identify clusters of customers with similar characteristics. - Targeted Marketing Campaigns: Use the insights to create personalized promotions for different customer segments.
Exploratory Questions - What is the relationship between annual income and spending score? - Does gender or age influence spending behavior? - Which customers have high spending scores but low incomes, or vice versa?
Suggested Analysis Techniques - EDA: Visualize income distribution, age groups, and spending patterns. - Clustering Algorithms: Use K-means or hierarchical clustering for segmentation. - Correlation Analysis: Investigate correlations between age, income, and spending score.
Licensing & Citation - License: Open for public use, suitable for educational and research purposes. - Citation: If you use this dataset in your project or research, please reference this dataset appropriately.
This dataset provides a great starting point for hands-on learning in customer analytics, marketing strategy, and machine learning. Perfect for beginners and data enthusiasts looking to explore clustering or segmentation techniques!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The expected number of mutations is the same in each scenario (μ × ℓ is constant). Values are averages over 25 simulations with no sequencing error (ϵ = 0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive information on global salmon populations, focusing on their decline in oceanic environments. It includes various data points collected over time to track and analyze trends in salmon populations. The key columns in this dataset are:
SERIES - Internal code for dataset indicating Domain, species, and Status Review data set year and when applicable, method.
NMFS_POPID - The unique numeric value for a population as determined by NMFS. This value will not change over time, even if the population name (NWR Population Name) does.
RECOVERY_DOMAIN - Discrete geographic areas for which comprehensive recovery plans are being developed: Puget Sound, Willamette/Lower Columbia, Interior Columbia (including the Mid-Columbia, Upper Columbia, and Snake River sub-domains), Oregon Coast, and Southern/Oregon Northern California Coast.
ESU - For populations listed under the federal ESA, this is the name of a defined Evolutionary Significant Unit (ESU) or Distinct Population Segment (DPS) as defined by NMFS Northwest Region or by USFWS.
MAJOR_POPULATION_GROUP - Major Population Group, as defined by the NWR. Groups of populations within an ESU/DPS that are more similar to each other than they are to other populations. They are based on similarities in genetic characteristics, demographic patterns and habitat types and on geographic structure.
POPULATION_NAME - Legal given name for a listed population within the ESU.
COMMON_POPULATION_NAME - Shortened population name
DISPLAY_ORDER - Geographically based display order within ESUs.
SPECIES - Salmon species name
RUN_TIMING - Run of fish, generally determined on the basis of the time of year at which adults enter fresh water to spawn. (Spring, Summer, Spring/Summer, Fall, Winter, early, or late)
STREAM_NAME - Name of the primary stream for the Population
YEAR - Calender year of return
NUMBER_OF_SPAWNERS - Estimated number of natural origin (parents spawned in the wild) spawners contributing to spawning in a particular year. Includes both adults and jacks of natural origin (except for SR fall chinook which typically does have jack returns)
FRACWILD - The fraction of the total spawners that are the progeny of naturally-spawning fish.
CATCH - Terminal fishery harvest
AGE_1_RETURNS - The fraction of fish who are defined as having an age of 1 that returned to spawn in a given year.
AGE_2_RETURNS - The fraction of fish who are defined as having an age of 2 that returned to spawn in a given year.
AGE_3_RETURNS - The fraction of fish who are defined as having an age of 3 that returned to spawn in a given year.
AGE_4_RETURNS - The fraction of fish who are defined as having an age of 4 that returned to spawn in a given year.
AGE_5_RETURNS - The fraction of fish who are defined as having an age of 5 that returned to spawn in a given year.
AGE_6_RETURNS - The fraction of fish who are defined as having an age of 6 that returned to spawn in a given year.
AGE_7_RETURNS - The fraction of fish who are defined as having an age of 7 that returned to spawn in a given year.
METHOD - Survey (spawning ground), Model (PIT tag data), or GSI (genetic stock inventory data), Ladder count (at dam)
CITATION - Data source citation
CONTRIBUTOR - Agency, Tribe or other entity responsible for these data that is the best contact for questions that may arise about this data record.
DOCUMENT_CITATION - Citation of the document this dataset archive informed.
CODE_LINK - Location of the code used to generate analysis for the document
You can access the dataset here
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was collected from Kaggle. It includes various features related to customer demographics, purchasing behavior, and other relevant metrics.