Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pandemics such as Covid-19 pose tremendous public health communication challenges in promoting protective behaviours, vaccination, and educating the public about risks. Segmenting audiences based on attitudes and behaviours is a means to increase the precision and potential effectiveness of such communication. The present study reports on such an audience segmentation effort for the population of England, sponsored by the United Kingdom Health Security Agency (UKHSA) and involving a collaboration of market research and academic experts. A cross-sectional online survey was conducted between 4 and 24 January 2022 with 5525 respondents (5178 used in our analyses) in England using market research opt-in panel. An additional 105 telephone interviews were conducted to sample persons without online or smartphone access. Respondents were quota sampled to be demographically representative. The primary analytic technique was k means cluster analysis, supplemented with other techniques including multi-dimensional scaling and use of respondent ‐ as well as sample-standardized data when necessary to address differences in response set for some groups of respondents. Identified segments were profiled against demographic, behavioural self-report, attitudinal, and communication channel variables, with differences by segment tested for statistical significance. Seven segments were identified, including distinctly different groups of persons who tended toward a high level of compliance and several that were relatively low in compliance. The segments were characterized by distinctive patterns of demographics, attitudes, behaviours, trust in information sources, and communication channels preferred. Segments were further validated by comparing the segmentation variable versus a set of demographic variables as predictors of reported protective behaviours in the past two weeks and of vaccine refusal; the demographics together had about one-quarter the effect size of the single seven-level segment variable. With respect to managerial implications, different communication strategies for each segment are suggested for each segment, illustrating advantages of rich segmentation descriptions for understanding public health communication audiences. Strengths and weaknesses of the methods used are discussed, to help guide future efforts.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains customer demographic and behavioral information designed for exploring segmentation, clustering, and predictive analytics in retail and marketing contexts. It provides a simple yet powerful foundation for practicing data science techniques such as K-Means clustering, customer profiling, and recommendation systems.
### Dataset Features
- CustomerID: Unique identifier for each customer
- Genre: Gender of the customer (Male/Female)
- Age: Age of the customer (years)
- Annual Income (k$): Annual income in thousands of dollars
- Spending Score: A score assigned by the business based on customer behavior and spending patterns
Notes
- Some records contain missing values (nan) in Age, Annual Income, or Spending Score. These can be handled using imputation, removal, or advanced techniques depending on the analysis.
- Spending Score is an arbitrary metric often used in clustering exercises to simulate customer engagement.
### Potential Use Cases
- Customer Segmentation: Apply clustering algorithms (e.g., K-Means, DBSCAN) to group customers by income and spending habits.
- Marketing Strategy: Identify high-value customers and tailor promotions.
- Predictive Modeling: Build models to predict spending behavior based on demographics.
- Data Cleaning Practice: Handle missing values and prepare the dataset for machine learning tasks.
This dataset is widely used in machine learning tutorials and business analytics projects because it is small, interpretable, and directly applicable to real-world scenarios like retail customer analysis. It’s ideal for beginners learning clustering and for professionals prototyping segmentation strategies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Segments and demographic variables predicting Covid-19 protective behaviors.
Facebook
TwitterHere's a step-by-step guide on how to approach user segmentation for FitTrackr:
Define your segmentation goals: Start by determining what you want to achieve with user segmentation. For example, you might want to identify the most engaged users, understand the demographics of your user base, or target specific user groups with personalized promotions.
Gather data: Collect relevant data about your app users. This can include demographic information (age, gender, location), app usage data (frequency of app usage, time spent on different features), user behavior (types of workouts, goals set, achievements unlocked), and any other relevant data points available to you.
Identify relevant segmentation variables: Based on the goals you defined, identify the key variables that will help you segment your user base effectively. For FitTrackr, potential variables could include age, gender, fitness goals (e.g., weight loss, muscle gain), workout preferences (e.g., cardio, strength training), and user engagement level.
Segment the user base: Use clustering techniques or segmentation algorithms to divide your user base into distinct segments based on the identified variables. You can employ methods such as k-means clustering, hierarchical clustering, or even machine learning algorithms like decision trees or random forests.
Analyze and profile each segment: Once the segmentation is done, analyze each segment to understand their characteristics, preferences, and needs. Create detailed user profiles for each segment, including demographic information, app usage patterns, fitness goals, and any other relevant attributes. This will help you tailor your marketing messages and app features to each segment's specific requirements.
Develop targeted strategies: Based on the insights gained from user profiles, develop targeted marketing strategies and app features for each segment. For example, if you have a segment of users who primarily focus on weight loss, you might create personalized workout plans or send them motivational content related to weight management.
Implement and evaluate: Implement the targeted strategies and monitor their effectiveness. Continuously evaluate and refine your segmentation approach based on user feedback, engagement metrics, and the achievement of your goals.
Facebook
TwitterDRAKO is a Mobile Location Audience Targeting provider with a programmatic trading desk specialising in geolocation analytics and programmatic advertising. Through our customised approach, we offer business and consumer insights as well as addressable audiences for advertising.
Mobile Location Data can be meaningfully transformed into Audience Targeting when used in conjunction with other dataset. Our expansive POI Data allows us to segment users by visitation to major brands and retailers as well as categorizes them into syndicated segments. Beyond POI visits, our proprietary Home Location Model determines residents of geographic areas such as Designated Market Areas, Counties, or States. Relatedly, our Home Location Model also fuels our Geodemographic Census Data segments as we are able to determine residents of the smallest census units. Additionally, we also have audiences of: ticketed event and venue visitors; survey data; and retail data.
All of our Audience Targeting is 100% deterministic in that it only includes high-quality, real visits to locations as defined by a POIs satellite imagery buildings contour. We never use a radius when building an audience unless requested. We have a horizontal accuracy of 5m.
Additionally, we can always cross reference your audience targeting with our syndicated segments:
Overview of our Syndicated Audience Data Segments: - Brand/POI segments (specific named stores and locations) - Categories (behavioural segments - revealed habits) - Census demographic segments (HH income, race, religion, age, family structure, language, etc.,) - Events segments (ticketed live events, conferences, and seminars) - Resident segments (State/province, CMAs, DMAs, city, county, sub-county) - Political segments (Canadian Federal and Provincial, US Congressional Upper and Lower House, US States, City elections, etc.,) - Survey Data (Psychosocial/Demographic survey data) - Retail Data (Receipt/transaction data)
All of our syndicated segments are customizable. That means you can limit them to people within a certain geography, remove employees, include only the most frequent visitors, define your own custom lookback, or extend our audiences using our Home, Work, and Social Extensions.
In addition to our syndicated segments, we’re also able to run custom queries return to you all the Mobile Ad IDs (MAIDs) seen at in a specific location (address; latitude and longitude; or WKT84 Polygon) or in your defined geographic area of interest (political districts, DMAs, Zip Codes, etc.,)
Beyond just returning all the MAIDs seen within a geofence, we are also able to offer additional customizable advantages: - Average precision between 5 and 15 meters - CRM list activation + extension - Extend beyond Mobile Location Data (MAIDs) with our device graph - Filter by frequency of visitations - Home and Work targeting (retrieve only employees or residents of an address) - Home extensions (devices that reside in the same dwelling from your seed geofence) - Rooftop level address geofencing precision (no radius used EVER unless user specified) - Social extensions (devices in the same social circle as users in your seed geofence) - Turn analytics into addressable audiences - Work extensions (coworkers of users in your seed geofence)
Data Compliance: All of our Audience Targeting Data is fully CCPA compliant and 100% sourced from SDKs (Software Development Kits), the most reliable and consistent mobile data stream with end user consent available with only a 4-5 day delay. This means that our location and device ID data comes from partnerships with over 1,500+ mobile apps. This data comes with an associated location which is how we are able to segment using geofences.
Data Quality: In addition to partnering with trusted SDKs, DRAKO has additional screening methods to ensure that our mobile location data is consistent and reliable. This includes data harmonization and quality scoring from all of our partners in order to disregard MAIDs with a low quality score.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Project Overview: Customer Segmentation Using K-Means Clustering
Introduction In this project, I analysed customer data from a retail store to identify distinct customer segments. The dataset includes key attributes such as age, city, and total sales of the customers. By leveraging K-Means clustering, an unsupervised machine learning technique, I aim to group customers based on their age and sales metrics. These insights will enable the creation of targeted marketing campaigns tailored to the specific needs and behaviours of each customer segment.
Objectives - Cluster Customers: Use K-Means clustering to group customers based on age and total sales. - Analyse Segments: Examine the characteristics of each customer segment. - Targeted Marketing: Develop strategies for personalized marketing campaigns targeting each identified customer group.
Data Description The dataset comprises:
Methodology - Data Preprocessing: Clean and preprocess the data to handle any missing or inconsistent entries. - Feature Selection: Focus on age and total sales as primary features for clustering. - K-Means Clustering: Apply the K-Means algorithm to identify distinct customer segments. - Cluster Analysis: Analyse the resulting clusters to understand the demographic and sales characteristics of each group. - Marketing Strategy Development: Create targeted marketing strategies for each customer segment to enhance engagement and sales.
Expected Outcomes - Customer Segments: Clear identification of customer groups based on age and purchasing behaviour. - Insights for Marketing: Detailed understanding of each segment to inform targeted marketing efforts. - Business Impact: Enhanced ability to tailor marketing campaigns, potentially leading to increased customer satisfaction and sales.
By clustering customers based on age and total sales, this project aims to provide actionable insights for personalized marketing, ultimately driving better customer engagement and higher sales for the retail store.
Facebook
TwitterAI in Consumer Decision-Making: Global Zero-Party Dataset
This dataset captures how consumers around the world are using AI tools like ChatGPT, Perplexity, Gemini, Claude, and Copilot to guide their purchase decisions. It spans multiple product categories, demographics, and geographies, mapping the emerging role of AI as a decision-making companion across the consumer journey.
What Makes This Dataset Unique
Unlike datasets inferred from digital traces or modeled from third-party assumptions, this collection is built entirely on zero-party data: direct responses from consumers who voluntarily share their habits and preferences. That means the insights come straight from the people making the purchases, ensuring unmatched accuracy and relevance.
For FMCG leaders, retailers, and financial services strategists, this dataset provides the missing piece: visibility into how often consumers are letting AI shape their decisions, and where that influence is strongest.
Dataset Structure
Each record is enriched with: Product Category – from high-consideration items like electronics to daily staples such as groceries and snacks. AI Tool Used – identifying whether consumers turn to ChatGPT, Gemini, Perplexity, Claude, or Copilot. Influence Level – the percentage of consumers in a given context who rely on AI to guide their choices. Demographics – generational breakdowns from Gen Z through Boomers. Geographic Detail – city- and country-level coverage across Africa, LATAM, Asia, Europe, and North America.
This structure allows filtering and comparison across categories, age groups, and markets, giving users a multidimensional view of AI’s impact on purchasing.
Why It Matters
AI has become a trusted voice in consumers’ daily lives. From meal planning to product comparisons, many people now consult AI before making a purchase—often without realizing how much it shapes the options they consider. For brands, this means that the path to purchase increasingly runs through an AI filter.
This dataset provides a comprehensive view of that hidden step in the consumer journey, enabling decision-makers to quantify: How much AI shapes consumer thinking before they even reach the shelf or checkout. Which product categories are most influenced by AI consultation. How adoption varies by geography and generation. Which AI platforms are most commonly trusted by consumers.
Opportunities for Business Leaders
FMCG & Retail Brands: Understand where AI-driven decision-making is already reshaping category competition. Marketers: Identify demographic segments most likely to consult AI, enabling targeted strategies. Retailers: Align assortments and promotions with the purchase patterns influenced by AI queries. Investors & Innovators: Gauge market readiness for AI-integrated commerce solutions.
The dataset doesn’t just describe what’s happening—it opens doors to the “so what” questions that define strategy. Which categories are becoming algorithm-driven? Which markets are shifting fastest? Where is the opportunity to get ahead of competitors in an AI-shaped funnel?
Why Now
Consumer AI adoption is no longer a forecast; it is a daily behavior. Just as search engines once rewrote the rules of marketing, conversational AI is quietly rewriting how consumers decide what to buy. This dataset offers an early, detailed view into that change, giving brands the ability to act while competitors are still guessing.
What You Get
Users gain: A global, city-level view of AI adoption in consumer decision-making. Cross-category comparability to see where AI influence is strongest and weakest. Generational breakdowns that show how adoption differs between younger and older cohorts. AI platform analysis, highlighting how tool preferences vary by region and category. Every row is powered by zero-party input, ensuring the insights reflect actual consumer behavior—not modeled assumptions.
How It’s Used
Leverage this data to:
Validate strategies before entering new markets or categories. Benchmark competitors on AI readiness and influence. Identify growth opportunities in categories where AI-driven recommendations are rapidly shaping decisions. Anticipate risks where brand visibility could be disrupted by algorithmic mediation.
Core Insights
The full dataset reveals: Surprising adoption curves across categories where AI wasn’t expected to play a role. Geographic pockets where AI has already become a standard step in purchase decisions. Demographic contrasts showing who trusts AI most—and where skepticism still holds. Clear differences between AI platforms and the consumer profiles most drawn to each.
These patterns are not visible in traditional retail data, sales reports, or survey summaries. They are only captured here, directly from the consumers themselves.
Summary
Winning in FMCG and retail today means more than getting on shelves, capturing price points, or running promotions. It means understanding the invisible algorithms consumers are ...
Facebook
TwitterMillennials were the largest generation group in the United States in 2024, with an estimated population of ***** million. Born between 1981 and 1996, Millennials recently surpassed Baby Boomers as the biggest group, and they will continue to be a major part of the population for many years. The rise of Generation Alpha Generation Alpha is the most recent to have been named, and many group members will not be able to remember a time before smartphones and social media. As of 2024, the oldest Generation Alpha members were still only aging into adolescents. However, the group already makes up around ***** percent of the U.S. population, and they are said to be the most racially and ethnically diverse of all the generation groups. Boomers vs. Millennials The number of Baby Boomers, whose generation was defined by the boom in births following the Second World War, has fallen by around ***** million since 2010. However, they remain the second-largest generation group, and aging Boomers are contributing to steady increases in the median age of the population. Meanwhile, the Millennial generation continues to grow, and one reason for this is the increasing number of young immigrants arriving in the United States.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Factors used to create segmentation and items comprising them.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vaccination status and past two-week protective behavior by segment.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides comprehensive information on global salmon populations, focusing on their decline in oceanic environments. It includes various data points collected over time to track and analyze trends in salmon populations. The key columns in this dataset are:
SERIES - Internal code for dataset indicating Domain, species, and Status Review data set year and when applicable, method.
NMFS_POPID - The unique numeric value for a population as determined by NMFS. This value will not change over time, even if the population name (NWR Population Name) does.
RECOVERY_DOMAIN - Discrete geographic areas for which comprehensive recovery plans are being developed: Puget Sound, Willamette/Lower Columbia, Interior Columbia (including the Mid-Columbia, Upper Columbia, and Snake River sub-domains), Oregon Coast, and Southern/Oregon Northern California Coast.
ESU - For populations listed under the federal ESA, this is the name of a defined Evolutionary Significant Unit (ESU) or Distinct Population Segment (DPS) as defined by NMFS Northwest Region or by USFWS.
MAJOR_POPULATION_GROUP - Major Population Group, as defined by the NWR. Groups of populations within an ESU/DPS that are more similar to each other than they are to other populations. They are based on similarities in genetic characteristics, demographic patterns and habitat types and on geographic structure.
POPULATION_NAME - Legal given name for a listed population within the ESU.
COMMON_POPULATION_NAME - Shortened population name
DISPLAY_ORDER - Geographically based display order within ESUs.
SPECIES - Salmon species name
RUN_TIMING - Run of fish, generally determined on the basis of the time of year at which adults enter fresh water to spawn. (Spring, Summer, Spring/Summer, Fall, Winter, early, or late)
STREAM_NAME - Name of the primary stream for the Population
YEAR - Calender year of return
NUMBER_OF_SPAWNERS - Estimated number of natural origin (parents spawned in the wild) spawners contributing to spawning in a particular year. Includes both adults and jacks of natural origin (except for SR fall chinook which typically does have jack returns)
FRACWILD - The fraction of the total spawners that are the progeny of naturally-spawning fish.
CATCH - Terminal fishery harvest
AGE_1_RETURNS - The fraction of fish who are defined as having an age of 1 that returned to spawn in a given year.
AGE_2_RETURNS - The fraction of fish who are defined as having an age of 2 that returned to spawn in a given year.
AGE_3_RETURNS - The fraction of fish who are defined as having an age of 3 that returned to spawn in a given year.
AGE_4_RETURNS - The fraction of fish who are defined as having an age of 4 that returned to spawn in a given year.
AGE_5_RETURNS - The fraction of fish who are defined as having an age of 5 that returned to spawn in a given year.
AGE_6_RETURNS - The fraction of fish who are defined as having an age of 6 that returned to spawn in a given year.
AGE_7_RETURNS - The fraction of fish who are defined as having an age of 7 that returned to spawn in a given year.
METHOD - Survey (spawning ground), Model (PIT tag data), or GSI (genetic stock inventory data), Ladder count (at dam)
CITATION - Data source citation
CONTRIBUTOR - Agency, Tribe or other entity responsible for these data that is the best contact for questions that may arise about this data record.
DOCUMENT_CITATION - Citation of the document this dataset archive informed.
CODE_LINK - Location of the code used to generate analysis for the document
You can access the dataset here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Food festivals have been a growing tourism sector in recent years due to their contributions to a region’s economic, marketing, brand, and social growth. This study analyses the demand for the Bahrain food festival. The stated objectives were: i) To identify the motivational dimensions of the demand for the food festival, (ii) To determine the segments of the demand for the food festival, and (iii) To establish the relationship between the demand segments and socio-demographic aspects. The food festival investigated was the Bahrain Food Festival held in Bahrain, located on the east coast of the Persian Gulf. The sample consisted of 380 valid questionnaires and was taken using social networks from those attending the event. The statistical techniques used were factorial analysis and the K-means grouping method. The results show five motivational dimensions: Local food, Art, Entertainment, Socialization, and Escape and novelty. In addition, two segments were found; the first, Entertainment and novelties, is related to attendees who seek to enjoy the festive atmosphere and discover new restaurants. The second is Multiple motives, formed by attendees with several motivations simultaneously. This segment has the highest income and expenses, making it the most important group for developing plans and strategies. The results will contribute to the academic literature and the organizers of food festivals.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Trust in information sources re Covid-19 guidance by segment.
Facebook
TwitterThe 2019 Ethiopia Mini Demographic and Health Survey (EMDHS) is a nationwide survey with a nationally representative sample of 9,150 selected households. All women age 15-49 who were usual members of the selected households and those who spent the night before the survey in the selected households were eligible to be interviewed in the survey. In the selected households, all children under age 5 were eligible for height and weight measurements. The survey was designed to produce reliable estimates of key indicators at the national level as well as for urban and rural areas and each of the 11 regions in Ethiopia.
The primary objective of the 2019 EMDHS is to provide up-to-date estimates of key demographic and health indicators. Specifically, the main objectives of the survey are: ▪ To collect high-quality data on contraceptive use; maternal and child health; infant, child, and neonatal mortality levels; child nutrition; and other health issues relevant to achievement of the Sustainable Development Goals (SDGs) ▪ To collect information on health-related matters such as breastfeeding, maternal and child care (antenatal, delivery, and postnatal), children’s immunizations, and childhood diseases ▪ To assess the nutritional status of children under age 5 by measuring weight and height
National coverage
The survey covered all de jure household members (usual residents), all women aged 15-49 and all children aged 0-5 resident in the household.
Sample survey data [ssd]
The sampling frame used for the 2019 EMDHS is a frame of all census enumeration areas (EAs) created for the 2019 Ethiopia Population and Housing Census (EPHC) and conducted by the Central Statistical Agency (CSA). The census frame is a complete list of the 149,093 EAs created for the 2019 EPHC. An EA is a geographic area covering an average of 131 households. The sampling frame contains information about EA location, type of residence (urban or rural), and estimated number of residential households.
Administratively, Ethiopia is divided into nine geographical regions and two administrative cities. The sample for the 2019 EMDHS was designed to provide estimates of key indicators for the country as a whole, for urban and rural areas separately, and for each of the nine regions and the two administrative cities.
The 2019 EMDHS sample was stratified and selected in two stages. Each region was stratified into urban and rural areas, yielding 21 sampling strata. Samples of EAs were selected independently in each stratum in two stages. Implicit stratification and proportional allocation were achieved at each of the lower administrative levels by sorting the sampling frame within each sampling stratum before sample selection, according to administrative units in different levels, and by using a probability proportional to size selection at the first stage of sampling.
To ensure that survey precision was comparable across regions, sample allocation was done through an equal allocation wherein 25 EAs were selected from eight regions. However, 35 EAs were selected from each of the three larger regions: Amhara, Oromia, and the Southern Nations, Nationalities, and Peoples’ Region (SNNPR).
In the first stage, a total of 305 EAs (93 in urban areas and 212 in rural areas) were selected with probability proportional to EA size (based on the 2019 EPHC frame) and with independent selection in each sampling stratum. A household listing operation was carried out in all selected EAs from January through April 2019. The resulting lists of households served as a sampling frame for the selection of households in the second stage. Some of the selected EAs for the 2019 EMDHS were large, with more than 300 households. To minimise the task of household listing, each large EA selected for the 2019 EMDHS was segmented. Only one segment was selected for the survey, with probability proportional to segment size. Household listing was conducted only in the selected segment; that is, a 2019 EMDHS cluster is either an EA or a segment of an EA.
In the second stage of selection, a fixed number of 30 households per cluster were selected with an equal probability systematic selection from the newly created household listing. All women age 15-49 who were either permanent residents of the selected households or visitors who slept in the household the night before the survey were eligible to be interviewed. In all selected households, height and weight measurements were collected from children age 0-59 months, and women age 15-49 were interviewed using the Woman’s Questionnaire.
For further details on sample selection, see Appendix A of the final report.
Computer Assisted Personal Interview [capi]
Five questionnaires were used for the 2019 EMDHS: (1) the Household Questionnaire, (2) the Woman’s Questionnaire, (3) the Anthropometry Questionnaire, (4) the Health Facility Questionnaire, and (5) the Fieldworker’s Questionnaire. These questionnaires, based on The DHS Program’s standard questionnaires, were adapted to reflect the population and health issues relevant to Ethiopia. They were shortened substantially to collect data on indicators of particular relevance to Ethiopia and donors to child health programmes.
All electronic data files were transferred via the secure internet file streaming system (IFSS) to the EPHI central office in Addis Ababa, where they were stored on a password-protected computer. The data processing operation included secondary editing, which required resolution of computer-identified inconsistencies and coding of open-ended questions. The data were processed by EPHI staff members and an ICF consultant who took part in the main fieldwork training. They were supervised remotely by staff from The DHS Program. Data editing was accomplished using CSPro System software. During the fieldwork, field-check tables were generated to check various data quality parameters, and specific feedback was given to the teams to improve performance. Secondary editing, double data entry from both the anthropometry and health facility questionnaires, and data processing were initiated in April 2019 and completed in July 2019.
A total of 9,150 households were selected for the sample, of which 8,794 were occupied. Of the occupied households, 8,663 were successfully interviewed, yielding a response rate of 99%.
In the interviewed households, 9,012 eligible women were identified for individual interviews; interviews were completed with 8,885 women, yielding a response rate of 99%. Overall, there was little variation in response rates according to residence; however, rates were slightly higher in rural than in urban areas.
The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2019 Ethiopia Mini Demographic and Health Survey (EMDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2019 EMDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
Sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2019 EMDHS sample is the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulas. Sampling errors are computed in SAS, using programs developed by ICF. These programs use the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.
Note: A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.
Data Quality Tables
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets contain RGB photos of Scots pine seedlings of three populations from two different ecotypes originating in the Czech Republic:Plasy - lowland ecotype,Trebon - lowland ecotype,Decin - upland ecotype.These photos were taken in three different periods (September 10th 2021, October 23rd 2021, January 22nd 2022).File dataset_for_YOLOv7_training.zip contains image data with annotations for training YOLOv7 segmentation model (training and validation sets)The dataset also contains a table with information on individual Scots pine seedlings:affiliation to parent tree (mum)affiliation to population (site)row and column in which the seedling was grown (row, col)affiliation to the planter in which the seedling was grown (box)mean RGB values of pine seedling in three different periods (B_september, G_september, R_september B_october, G_october, R_october, B_january, G_january, R_january)mean HSV values of pine seedling in three different periods (H_september, S_september, V_september, H_october, S_october, V_october, H_january, S_january, V_january)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multidimensional scaling for preliminary assessment of segment interpretability.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSegmentation of heterogeneous patient populations into parsimonious and relatively homogenous groups with similar healthcare needs can facilitate healthcare resource planning and development of effective integrated healthcare interventions for each segment. We aimed to apply a data-driven, healthcare utilization-based clustering analysis to segment a regional health system patient population and validate its discriminative ability on 4-year longitudinal healthcare utilization and mortality data.MethodsWe extracted data from the Singapore Health Services Electronic Health Intelligence System, an electronic medical record database that included healthcare utilization (inpatient admissions, specialist outpatient clinic visits, emergency department visits, and primary care clinic visits), mortality, diseases, and demographics for all adult Singapore residents who resided in and had a healthcare encounter with our regional health system in 2012. Hierarchical clustering analysis (Ward’s linkage) and K-means cluster analysis using age and healthcare utilization data in 2012 were applied to segment the selected population. These segments were compared using their demographics (other than age) and morbidities in 2012, and longitudinal healthcare utilization and mortality from 2013–2016.ResultsAmong 146,999 subjects, five distinct patient segments “Young, healthy”; “Middle age, healthy”; “Stable, chronic disease”; “Complicated chronic disease” and “Frequent admitters” were identified. Healthcare utilization patterns in 2012, morbidity patterns and demographics differed significantly across all segments. The “Frequent admitters” segment had the smallest number of patients (1.79% of the population) but consumed 69% of inpatient admissions, 77% of specialist outpatient visits, 54% of emergency department visits, and 23% of primary care clinic visits in 2012. 11.5% and 31.2% of this segment has end stage renal failure and malignancy respectively. The validity of cluster-analysis derived segments is supported by discriminative ability for longitudinal healthcare utilization and mortality from 2013–2016. Incident rate ratios for healthcare utilization and Cox hazards ratio for mortality increased as patient segments increased in complexity. Patients in the “Frequent admitters” segment accounted for a disproportionate healthcare utilization and 8.16 times higher mortality rate.ConclusionOur data-driven clustering analysis on a general patient population in Singapore identified five patient segments with distinct longitudinal healthcare utilization patterns and mortality risk to provide an evidence-based segmentation of a regional health system’s healthcare needs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic data: Gender and race distribution, and mean values with standard deviations and ranges for age, IOP, MD and GHT.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction: White matter hyperintensities of presumed vascular origin (WMH) are an important magnetic resonance imaging marker of cerebral small vessel disease and are associated with cognitive decline, stroke, and mortality. Their relevance in healthy individuals, however, is less clear. This is partly due to the methodological challenge of accurately measuring rare and small WMH with automated segmentation programs. In this study, we tested whether WMH volumetry with FMRIB software library v6.0 (FSL; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) Brain Intensity AbNormality Classification Algorithm (BIANCA), a customizable and trainable algorithm that quantifies WMH volume based on individual data training sets, can be optimized for a normal aging population.Methods: We evaluated the effect of varying training sample sizes on the accuracy and the robustness of the predicted white matter hyperintensity volume in a population (n = 201) with a low prevalence of confluent WMH and a substantial proportion of participants without WMH. BIANCA was trained with seven different sample sizes between 10 and 40 with increments of 5. For each sample size, 100 random samples of T1w and FLAIR images were drawn and trained with manually delineated masks. For validation, we defined an internal and external validation set and compared the mean absolute error, resulting from the difference between manually delineated and predicted WMH volumes for each set. For spatial overlap, we calculated the Dice similarity index (SI) for the external validation cohort.Results: The study population had a median WMH volume of 0.34 ml (IQR of 1.6 ml) and included n = 28 (18%) participants without any WMH. The mean absolute error of the difference between BIANCA prediction and manually delineated masks was minimized and became more robust with an increasing number of training participants. The lowest mean absolute error of 0.05 ml (SD of 0.24 ml) was identified in the external validation set with a training sample size of 35. Compared to the volumetric overlap, the spatial overlap was poor with an average Dice similarity index of 0.14 (SD 0.16) in the external cohort, driven by subjects with very low lesion volumes.Discussion: We found that the performance of BIANCA, particularly the robustness of predictions, could be optimized for use in populations with a low WMH load by enlargement of the training sample size. Further work is needed to evaluate and potentially improve the prediction accuracy for low lesion volumes. These findings are important for current and future population-based studies with the majority of participants being normal aging people.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unless otherwise stated, 25 simulation replicates were generated in each scenario. Model Ga is used for inferences given true IBD and Model Gb is used for inferences from inferred IBD. The value of r is assumed known for all inferences, whereas μ, ϵ and N(g), g ≥ 0, are targets of inference.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pandemics such as Covid-19 pose tremendous public health communication challenges in promoting protective behaviours, vaccination, and educating the public about risks. Segmenting audiences based on attitudes and behaviours is a means to increase the precision and potential effectiveness of such communication. The present study reports on such an audience segmentation effort for the population of England, sponsored by the United Kingdom Health Security Agency (UKHSA) and involving a collaboration of market research and academic experts. A cross-sectional online survey was conducted between 4 and 24 January 2022 with 5525 respondents (5178 used in our analyses) in England using market research opt-in panel. An additional 105 telephone interviews were conducted to sample persons without online or smartphone access. Respondents were quota sampled to be demographically representative. The primary analytic technique was k means cluster analysis, supplemented with other techniques including multi-dimensional scaling and use of respondent ‐ as well as sample-standardized data when necessary to address differences in response set for some groups of respondents. Identified segments were profiled against demographic, behavioural self-report, attitudinal, and communication channel variables, with differences by segment tested for statistical significance. Seven segments were identified, including distinctly different groups of persons who tended toward a high level of compliance and several that were relatively low in compliance. The segments were characterized by distinctive patterns of demographics, attitudes, behaviours, trust in information sources, and communication channels preferred. Segments were further validated by comparing the segmentation variable versus a set of demographic variables as predictors of reported protective behaviours in the past two weeks and of vaccine refusal; the demographics together had about one-quarter the effect size of the single seven-level segment variable. With respect to managerial implications, different communication strategies for each segment are suggested for each segment, illustrating advantages of rich segmentation descriptions for understanding public health communication audiences. Strengths and weaknesses of the methods used are discussed, to help guide future efforts.