Demografy is a privacy by design customer demographics prediction AI platform.
Core features: - Demographic segmentation - Demographic analytics - API integration - Data export
Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names
Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better
Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
The market for predictive analytics software was valued at **** billion U.S. dollars in 2020 and is forecasted to grow to ***** billion U.S. dollars by 2028. Predictive analytics are often used to analyze consumer behavior, and manage supply chains and business operations.
Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.
As of 2019, forecasts suggest that the predictive analytics market will reach over *********** U.S. dollars in total revenue. By 2022 the market is expected to reach nearly ** billion dollars in annual revenue as an increasingly large number of businesses make use of predictive analytics techniques for everything from fraud detection to medical diagnosis. Predictive analytics The field of predictive analytics involves the use of various statistical methods and models within businesses to make predictions about a wide range of future outcomes. Predictive analytical analysis is already one of the most widely adopted intelligent automation technologies in the world, with over ** percent of major enterprises deploying smart analytics that include predictive analytics. As business interactions around the world become increasingly digitalized, massive amounts of data are created which can be evaluated through predictive analytics tools in order to give users a better understanding of market dynamics and underlying trends. Considering this, it is no surprise that predictive models rank as the one of the top big data technology trends around the world.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The repository contains machine readable dataset aggregating relevant data from around 10 governmental and academic sources on the county-level for each county in the 50 states and in Washington D.C. and it included data on counties, demographics, socioeconomics, healthcare, education data for each county in the 50 states and D.C. In addition to county-level time series from the JHU CSSE COVID-19 dashboard (https://github.com/CSSEGISandData/COVID-19), the dataset contains multiple variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics in CSV formats.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the predictive performance indicators of three different models.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains information about adult income prediction. It includes the following columns:
workclass: The type of employment (e.g., Private, Self-emp-not-inc, Federal-gov, Local-gov) fnlwgt: The number of people the census believes the entry represents education: The highest level of education achieved education-num: The numeric representation of the previous column marital-status: The marital status of the individual occupation: The occupation of the individual relationship: The relationship of the individual to their household race: The race of the individual sex: The gender of the individual capital-gain: The capital gains of the individual capital-loss: The capital losses of the individual hours-per-week: The number of hours the individual works per week country: The native country of the individual salary: The income level of the individual, which is the target variable to predict.
The goal of this dataset is to build a model that can accurately predict the income level of an individual based on the provided features.
Model-based analyses are common in phylogeographic inference because they parameterize processes such as population division, gene flow and expansion that are of interest to biologists. Approximate Bayesian Computation is a model-based approach that can be customized to any empirical system and used to calculate the relative posterior probability of several models, provided that suitable models can be identified for comparison. The question of how to identify suitable models is explored using data from Plethodon idahoensis, a salamander that inhabits the North American inland northwest temperate rainforest. First, we conduct an ABC analysis using five models suggested by previous research, calculate the relative posterior probabilities, and find that a simple model of population isolation has the best fit to the data (PP = 0.70). In contrast to this subjective choice of models to include in the analysis, we also specify models in a more objective manner by simulating prior distributions...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides detailed logs of healthcare appointment bookings, enriched with patient demographics, medical history, and communication records such as reminders. It enables comprehensive analysis of no-show risk factors, supports predictive modeling, and helps optimize scheduling efficiency in clinical settings.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
The Woods & Poole Economics, Inc. 2023 Complete Economic and Demographic Data Source contains some of the Woods & Poole Economics, Inc. regional data and projections for the U.S. and all regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), Designated Market Areas (DMAs), and counties for 1969 or 1970 or 1990 through 2060. The remainder of this introduction contains the technical description of the and Download. Chapter 1 is an overview of the 2023 projections. Please read "Technical Description of the 2023 Regional Projections and Database" (Chapter 2) for an explanation of data sources, data definitions, and forecast methods. Appendices to Chapter 2 define the geographic areas used by Woods & Poole.
The SCA’s comprehensive capital planning process includes developing and analyzing quality data, creating and updating the Department of Education’s Five-Year Capital Plans, and monitoring projects through completion. The SCA prioritizes capital projects to best meet the capacity and building improvements needs throughout the City. Additionally, the SCA assures that the Capital Plan aligns with New York State and City Department of Education mandates, academic initiatives, and budgetary resources. This is one of the most current published reports.
Data Source: CA Department of Finance, Demographic Research Unit
Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.
This data biography shares the how, who, what, where, when, and why about this dataset. We, the epidemiology team at Napa County Health and Human Services Agency, Public Health Division, created it to help you understand where the data we analyze and share comes from. If you have any further questions, we can be reached at epidemiology@countyofnapa.org.
Data dashboard featuring this data: Napa County Demographics https://data.countyofnapa.org/stories/s/bu3n-fytj
How was the data collected? Population projections use the following demographic balancing equation: Current Population = Previous Population + (Births - Deaths) +Net Migration
Previous Population: the starting point for the population projection estimates is the 2020 US Census, informed by the Population Estimates Program data.
Births and Deaths: birth and death totals came from the California Department of Public Health, Vital Statistics Branch, which maintains birth and death records for California.
Net Migration: multiple sources of administrative records were used to estimate net migration, including driver’s license address changes, IRS tax return data, Medicare and Medi-Cal enrollment, federal immigration reports, elementary school enrollments, and group quarters population.
Who was included and excluded from the data? Previous Population: The goal of the US Census is to reflect all populations residing in a given geographic area. Results of two analyses done by the US Census Bureau showed that the 2020 Census total population counts were consistent with recent counts despite the challenges added by the pandemic. However, some populations were undercounted (the Black or African American population, the American Indian or Alaska Native population living on a reservation, the Hispanic or Latino population, and people who reported being of Some Other Race), and some were overcounted (the Non-Hispanic White population and the Asian population). Children, especially children younger than 4, were also undercounted.
Births and Deaths: Birth records include all people who are born in California as well as births to California residents that happened out of state. Death records include people who died while in California, as well as deaths of California residents that occurred out of state. Because birth and death record data comes from a registration process, the demographic information provided may not be accurate or complete.
Net Migration: each of the multiple sources of administrative records that were used to estimate net migration include and exclude different groups. For details about methodology, see https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf.
Where was the data collected? Data is collected throughout California. This subset of data includes Napa County.
When was the data collected? This subset of Napa County data is from Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.
These 2019 baseline projections incorporate the latest historical population, birth, death, and migration data available as of July 1, 2020. Historical trends from 1990 through 2020 for births, deaths, and migration are examined. County populations by age, sex, and race/ethnicity are projected to 2060.
Why was the data collected? The population projections were prepared under the mandate of the California Government Code (Cal. Gov't Code § 13073, 13073.5).
Where can I learn more about this data? https://dof.ca.gov/Forecasting/Demographics/Projections/ https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/P3_Dictionary.txt https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the predictive performance indicators of the two models.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About
This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.
Features
Goal
The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: Perform a longitudinal analysis of clinical features associated with Neurofibromatosis Type 1 (NF1) based on demographic and clinical characteristics, and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.
Methods: Using NF1 as a model system, we perform retrospective data analyses utilizing a manually-curated NF1 clinical registry and electronic health record (EHR) information, and develop machine-learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.
Results: Males and females were evenly represented in the cohort. White children were more likely to develop OPG (OR: 2.11, 95%CI: 1.11-4.00, p=0.02) relative to their non-white peers. Median age at diagnosis of OPG was 6.5 years (1.7-17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95%CI: 1.33-2.70, p<0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an AUROC of 0.74, and predicted diagnosis of OPG with an AUROC of 0.82.
Conclusions: Using readily available clinical and EHR data, we successfully recapitulated several important and clinically-relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naïve machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.
Methods Patients and Data Description
This study was performed using retrospective clinical data extracted from two sources within the Washington University Neurofibromatosis (NF) Center. First, data were extracted from an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database had a clinical diagnosis of NF1 based on current National Institutes of Health Consensus Development Conference diagnostic criteria,9 and had been assessed over multiple visits from 2002 to 2016 for the presence of clinical features associated with NF1. Data points in this registry included demographic information, such as age, race, and sex, in addition to NF1-related clinical features and associated conditions, such as café-au-lait macules, skinfold freckling, cutaneous neurofibromas, Lisch nodules, OPG, hypertension, ADHD, and cognitive impairment. These data were maintained in a semi-structured format containing textual and binary fields, capturing each individual’s data over multiple clinical visits. From these data, clinical features and phenotypes were extracted using data manipulation, imputation, and text mining techniques. Data obtained from this NF1 clinical registry were converted to data tables, which captured each patient visit and the presence/absence of specific clinical features at each visit. Clinical features which were once marked as present were assumed to be present for all future visits, and missing data were assumed absent for that specific visit. Categorical variables are reported as frequencies and proportions, and compared using odds ratios (ORs). Continuously distributed traits, adhering to both conventional normality assumptions and homogeneity of variances, are reported as mean and standard deviations, and compared using analysis of variance methods. Non-parametric equivalents were used for data with non-normative distributions.
Clinical Feature Extraction from Clinical Registry and EHR
The NF1 Clinical Registry comprised string-based clinical feature values, such as ADHD, OPG, and asthma. From these data, we extracted 27 unique clinical features in addition to longitudinal data on the development of NF1-related clinical features and associated diagnoses. For each clinical feature, age at initial presentation and/or diagnosis was computed, and median age of occurrence was calculated for each sex. The exact age of presentation and/or diagnosis could not be definitively ascertained for any feature that was present at a child’s initial clinic visit. As such, we computed the age of diagnosis only for those clinical features for which we have at least one visit documenting feature absence prior to the manifestation of that feature.
Diagnosis codes from the EHR-derived data set were also extracted. Diagnosis codes were recorded as 15,890 unique ICD 9/10 codes. Given the large number of ICD 9/10 codes, a consistent, concept-level “roll up” of relevant codes to a single phenotype description was created by mapping the extracted ICD 9/10 values to phenome-wide association (PheWAS) codes called Phecodes, which have been demonstrated to better align with clinical disease compared to individual ICD codes.
Machine Learning Analyses
Using a combination of clinical features obtained from the NF1 Clinical Registry and EHR-derived data sets, we developed prediction models using a gradient boosting platform for identifying patients with specific NF1-related diagnoses to establish the usefulness of clinical history and documentation of clinical findings in predicting phenotypic variability of NF1. Initial analyses used a state-of-the-art classification algorithm, gradient boosting model, which uses a tree-based algorithm to produce a predictive model from an ensemble of weak predictive models. Gradient boosting model was selected, as it supports identifying importance of features used in the final prediction model. Subsequent analyses employed training each model for three different feature sets: (1) demographic features for all patients, including race, sex, and family history of NF1 [5 features]; (2) clinical features associated with NF1 [27 features] extracted from the NF1 Clinical Registry; and (3) diagnosis codes extracted from the EHR data, which were reduced to 50 Phecodes. Four-fold cross validation was then applied for the three models, and comparisons for the prediction accuracies of each model determined. Positive predictive value (PPV), F1 score and the area under the receiver operator characteristic (AUROC) curve were used as evaluation metrics. Scikit Learn, a machine learning library in Python, was employed to implement all analyses.
Standard Protocol Approvals, Registrations, and Patient Consents
The NF1 Clinical Registry is an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database have a clinical diagnosis of NF1 based on current National Institutes of Health criteria and have provided informed consent for participation in the clinical registry. All data collection, usage and analysis for this study were approved by the Institutional Review Board (IRB) at the Washington University School of Medicine.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Life and Health (L&H) Insurance industry is experiencing a rapid transformation driven by the increasing adoption of data analytics. The market, valued at $2647.3 million in 2025, is projected to grow at a Compound Annual Growth Rate (CAGR) of 9.2% from 2025 to 2033. This robust growth is fueled by several key factors. Firstly, the need for improved risk assessment and underwriting is pushing insurers to leverage advanced analytics for predictive modeling. This allows for more accurate pricing, reduced fraud, and better customer segmentation. Secondly, demographic profiling enabled by data analytics helps insurers tailor products and services to specific customer needs, leading to increased customer satisfaction and retention. Data visualization tools further enhance decision-making by providing clear and concise insights into complex datasets, facilitating better strategy development and operational efficiency. Finally, the rise of Insurtech companies and the increasing availability of sophisticated software solutions are accelerating the adoption of data analytics across the L&H insurance sector. The competitive landscape is shaped by a mix of established players like Deloitte, SAP AG, and IBM, alongside specialized Insurtech firms offering innovative data analytics solutions. The segmentation of the market reveals significant opportunities across various applications and types. Predictive analysis, demographic profiling, and data visualization are the most prominent application segments, reflecting the industry's focus on risk management, customer understanding, and improved operational efficiency. The service and software segments represent the primary delivery models for data analytics solutions. While North America currently holds a dominant market share, regions like Asia-Pacific are experiencing rapid growth, driven by increasing digitalization and a rising middle class with growing insurance needs. Regulatory changes promoting data sharing and increased customer data privacy awareness are likely to influence market dynamics in the coming years. The key challenges include data security concerns, the need for skilled data scientists, and the integration of legacy systems with new data analytics platforms. Successfully navigating these challenges will be crucial for insurers to fully capitalize on the transformative potential of data analytics.
The 2018 edition of Woods and Poole Complete U.S. Database provides annual historical data from 1970 (some variables begin in 1990) and annual projections to 2050 of population by race, sex, and age, employment by industry, earnings of employees by industry, personal income by source, households by income bracket and retail sales by kind of business. The Complete U.S. Database contains annual data for all economic and demographic variables for all geographic areas in the Woods & Poole database (the U.S. total, and all regions, states, counties, and CBSAs). The Complete U.S. Database has following components: Demographic & Economic Desktop Data Files: There are 122 files covering demographic and economic data. The first 31 files (WP001.csv – WP031.csv) cover demographic data. The remaining files (WP032.csv – WP122.csv) cover economic data. Demographic DDFs: Provide population data for the U.S., regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), and counties. Each variable is in a separate .csv file. Variables: Total Population Population Age (breakdown: 0-4, 5-9, 10-15 etc. all the way to 85 & over) Median Age of Population White Population Population Native American Population Asian & Pacific Islander Population Hispanic Population, any Race Total Population Age (breakdown: 0-17, 15-17, 18-24, 65 & over) Male Population Female Population Economic DDFs: The other files (WP032.csv – WP122.csv) provide employment and income data on: Total Employment (by industry) Total Earnings of Employees (by industry) Total Personal Income (by source) Household income (by brackets) Total Retail & Food Services Sales ( by industry) Net Earnings Gross Regional Product Retail Sales per Household Economic & Demographic Flat File: A single file for total number of people by single year of age (from 0 to 85 and over), race, and gender. It covers all U.S., regions, states, CSAs, MSAs and counties. Years of coverage: 1990 - 2050 Single Year of Age by Race and Gender: Separate files for number of people by single year of age (from 0 years to 85 years and over), race (White, Black, Native American, Asian American & Pacific Islander and Hispanic) and gender. Years of coverage: 1990 through 2050. DATA AVAILABLE FOR 1970-2019; FORECASTS THROUGH 2050
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Diabetes Prediction Dataset is a dataset built for the purpose of predicting diabetes and analyzing related risk factors. It contains various characteristics such as demographics, lifestyle, and clinical measurements, so it can be used to predict a patient's risk of developing diabetes.
2) Data Utilization (1) Diabetes Prediction Dataset has characteristics that: • Key columns (characteristics) include a variety of clinical and lifestyle indicators related to diabetes, including age, gender, body mass index (BMI), blood pressure, blood sugar levels (Glucose), insulin, family history, and physical activity. (2) Diabetes Prediction Dataset can be used to: • Machine Learning/Deep Learning Model Development: It can be used to develop classification models (logistic regression, decision tree, random forest, neural network, etc.) that predict the risk of developing diabetes based on patient characteristics. • Data Analysis and Visualization: It is suitable for correlation analysis, risk factor derivation, Exploratory Data Analysis (EDA) and many other variables such as demographics, clinical figures, lifestyle, and more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/cf070156-0e60-46d0-a31d-0b34d9b0dcd4 on 27 January 2022.
--- Dataset description provided by original source is as follows ---
Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.
--- Original source retains full ownership of the source dataset ---
Demografy is a privacy by design customer demographics prediction AI platform.
Core features: - Demographic segmentation - Demographic analytics - API integration - Data export
Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names
Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better
Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.