100+ datasets found

d
Demografy's Consumer Demographics Prediction SaaS
datarade.ai
.json, .csv
Updated Jun 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Demografy (2021). Demografy's Consumer Demographics Prediction SaaS [Dataset]. https://datarade.ai/data-products/demografy-s-consumer-demographics-prediction-saas-demografy
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jun 4, 2021
Dataset authored and provided by
Demografy
Area covered
Moldova (Republic of), Italy, Czech Republic, Croatia, Luxembourg, Monaco, Sweden, Finland, Poland, Denmark
Description
Demografy is a privacy by design customer demographics prediction AI platform.

Core features: - Demographic segmentation - Demographic analytics - API integration - Data export

Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names

Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better

Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.
f
Is Demography Destiny? Application of Machine Learning Techniques to...
plos.figshare.com
figshare.com
docx
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei Luo; Thin Nguyen; Melanie Nichols; Truyen Tran; Santu Rana; Sunil Gupta; Dinh Phung; Svetha Venkatesh; Steve Allender (2023). Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset [Dataset]. http://doi.org/10.1371/journal.pone.0125602
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125602
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Wei Luo; Thin Nguyen; Melanie Nichols; Truyen Tran; Santu Rana; Sunil Gupta; Dinh Phung; Svetha Venkatesh; Steve Allender
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
Global predictive analytics market value 2020 and 2028
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global predictive analytics market value 2020 and 2028 [Dataset]. https://www.statista.com/statistics/1286871/predictive-analytics-market-size/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2020
Area covered
Worldwide
Description
The market for predictive analytics software was valued at **** billion U.S. dollars in 2020 and is forecasted to grow to ***** billion U.S. dollars by 2028. Predictive analytics are often used to analyze consumer behavior, and manage supply chains and business operations.
c
Statistical Forecasting Demographic Projection Report - Enrollment...
s.cnmilf.com
data.cityofnewyork.us
+3more
Updated Sep 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2023). Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/statistical-forecasting-demographic-projection-report-enrollment-projections-new-york-city
Explore at:
Dataset updated
Sep 2, 2023
Dataset provided by
data.cityofnewyork.us
Area covered
New York
Description
Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.
Predictive analytics market forecast worldwide 2016-2022
statista.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Predictive analytics market forecast worldwide 2016-2022 [Dataset]. https://www.statista.com/statistics/819415/worldwide-predictive-analytics-market-size/
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide
Description
As of 2019, forecasts suggest that the predictive analytics market will reach over *********** U.S. dollars in total revenue. By 2022 the market is expected to reach nearly ** billion dollars in annual revenue as an increasingly large number of businesses make use of predictive analytics techniques for everything from fraud detection to medical diagnosis. Predictive analytics The field of predictive analytics involves the use of various statistical methods and models within businesses to make predictions about a wide range of future outcomes. Predictive analytical analysis is already one of the most widely adopted intelligent automation technologies in the world, with over ** percent of major enterprises deploying smart analytics that include predictive analytics. As business interactions around the world become increasingly digitalized, massive amounts of data are created which can be evaluated through predictive analytics tools in order to give users a better understanding of market dynamics and underlying trends. Considering this, it is no surprise that predictive models rank as the one of the top big data technology trends around the world.
M
County-level Socioeconomic Data for Predictive Modeling of Epidemiological...
catalog.midasnetwork.us
csv for excel
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center (2024). County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects [Dataset]. https://catalog.midasnetwork.us/collection/19
Explore at:
csv for excelAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
MIDAS Coordination Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Time period covered
Jan 22, 2020 - Sep 13, 2020
Variables measured
disease, COVID-19, behavior, pathogen, case counts, Homo sapiens, host organism, age-stratified, mortality data, phenotypic sex, and 13 more
Dataset funded by
National Institute of General Medical Sciences
Description
The repository contains machine readable dataset aggregating relevant data from around 10 governmental and academic sources on the county-level for each county in the 50 states and in Washington D.C. and it included data on counties, demographics, socioeconomics, healthcare, education data for each county in the 50 states and D.C. In addition to county-level time series from the JHU CSSE COVID-19 dashboard (https://github.com/CSSEGISandData/COVID-19), the dataset contains multiple variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics in CSV formats.
Statistics of the predictive performance indicators of three different...
plos.figshare.com
xls
Updated Jun 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jin Wang; Shihan Ma; Qing Lv; Qiang Li (2025). Statistics of the predictive performance indicators of three different models. [Dataset]. http://doi.org/10.1371/journal.pone.0320298.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320298.t005
Dataset updated
Jun 25, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Jin Wang; Shihan Ma; Qing Lv; Qiang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of the predictive performance indicators of three different models.
Adult Income Prediction Classification
kaggle.com
Updated Dec 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sathyam A (2024). Adult Income Prediction Classification [Dataset]. https://www.kaggle.com/datasets/isathyam31/adult-income-prediction-classification
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2024
Dataset provided by
Kaggle
Authors
Sathyam A
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains information about adult income prediction. It includes the following columns:

workclass: The type of employment (e.g., Private, Self-emp-not-inc, Federal-gov, Local-gov) fnlwgt: The number of people the census believes the entry represents education: The highest level of education achieved education-num: The numeric representation of the previous column marital-status: The marital status of the individual occupation: The occupation of the individual relationship: The relationship of the individual to their household race: The race of the individual sex: The gender of the individual capital-gain: The capital gains of the individual capital-loss: The capital losses of the individual hours-per-week: The number of hours the individual works per week country: The native country of the individual salary: The income level of the individual, which is the target variable to predict.

The goal of this dataset is to build a model that can accurately predict the income level of an individual based on the provided features.
d
Data from: Model choice for phylogeographic inference using a large set of...
search.dataone.org
zenodo.org
+1more
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tara A. Pelletier; Bryan C. Carstens (2025). Model choice for phylogeographic inference using a large set of models [Dataset]. http://doi.org/10.5061/dryad.8kq65
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.8kq65
Dataset updated
Apr 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Tara A. Pelletier; Bryan C. Carstens
Time period covered
Jan 1, 2014
Description
Model-based analyses are common in phylogeographic inference because they parameterize processes such as population division, gene flow and expansion that are of interest to biologists. Approximate Bayesian Computation is a model-based approach that can be customized to any empirical system and used to calculate the relative posterior probability of several models, provided that suitable models can be identified for comparison. The question of how to identify suitable models is explored using data from Plethodon idahoensis, a salamander that inhabits the North American inland northwest temperate rainforest. First, we conduct an ABC analysis using five models suggested by previous research, calculate the relative posterior probabilities, and find that a simple model of population isolation has the best fit to the data (PP = 0.70). In contrast to this subjective choice of models to include in the analysis, we also specify models in a more objective manner by simulating prior distributions...
G
Healthcare Appointment No-Show Prediction
gomask.ai
csv
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GoMask.ai (2025). Healthcare Appointment No-Show Prediction [Dataset]. https://gomask.ai/marketplace/datasets/healthcare-appointment-no-show-prediction
Explore at:
csv(Unknown)Available download formats
Dataset updated
Jul 21, 2025
Dataset provided by
GoMask.ai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
no_show, clinic_id, department, patient_id, clinic_name, patient_age, provider_id, address_city, has_diabetes, scheduled_by, and 27 more
Description
This dataset provides detailed logs of healthcare appointment bookings, enriched with patient demographics, medical history, and communication records such as reminders. It enables comprehensive analysis of no-show risk factors, supports predictive modeling, and helps optimize scheduling efficiency in clinical settings.
Complete Economic and Demographic Data Source (CEDDS) 2023
aura.american.edu
Updated Feb 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Woods & Poole Economics, Inc. (2025). Complete Economic and Demographic Data Source (CEDDS) 2023 [Dataset]. http://doi.org/10.57912/23561643.v1
Explore at:
Unique identifier
https://doi.org/10.57912/23561643.v1
Dataset updated
Feb 10, 2025
Dataset provided by
Authors
Woods & Poole Economics, Inc.
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
The Woods & Poole Economics, Inc. 2023 Complete Economic and Demographic Data Source contains some of the Woods & Poole Economics, Inc. regional data and projections for the U.S. and all regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), Designated Market Areas (DMAs), and counties for 1969 or 1970 or 1990 through 2060. The remainder of this introduction contains the technical description of the and Download. Chapter 1 is an overview of the 2023 projections. Please read "Technical Description of the 2023 Regional Projections and Database" (Chapter 2) for an explanation of data sources, data definitions, and forecast methods. Appendices to Chapter 2 define the geographic areas used by Woods & Poole.
d
Demographic Projection Report - Enrollment Projections - New York City...
catalog.data.gov
data.cityofnewyork.us
+2more
Updated Feb 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). Demographic Projection Report - Enrollment Projections - New York City Public Schools prepared by Statistical Forecasting [Dataset]. https://catalog.data.gov/dataset/demographic-projection-report-enrollment-projections-new-york-city-public-schools-prepared
Explore at:
Dataset updated
Feb 2, 2024
Dataset provided by
data.cityofnewyork.us
Area covered
New York
Description
The SCA’s comprehensive capital planning process includes developing and analyzing quality data, creating and updating the Department of Education’s Five-Year Capital Plans, and monitoring projects through completion. The SCA prioritizes capital projects to best meet the capacity and building improvements needs throughout the City. Additionally, the SCA assures that the Capital Plan aligns with New York State and City Department of Education mandates, academic initiatives, and budgetary resources. This is one of the most current published reports.
Population Projections for Napa County
data.countyofnapa.org
application/rdfxml +5
Updated Aug 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Finance (2023). Population Projections for Napa County [Dataset]. https://data.countyofnapa.org/w/sjku-zj9t/default?cur=5lvCEgbTfgE&from=i57KEYaw4ON
Explore at:
json, csv, xml, application/rssxml, application/rdfxml, tsvAvailable download formats
Dataset updated
Aug 10, 2023
Dataset authored and provided by
California Department of Financehttps://dof.ca.gov/
Area covered
Napa County
Description
Data Source: CA Department of Finance, Demographic Research Unit

Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.

This data biography shares the how, who, what, where, when, and why about this dataset. We, the epidemiology team at Napa County Health and Human Services Agency, Public Health Division, created it to help you understand where the data we analyze and share comes from. If you have any further questions, we can be reached at epidemiology@countyofnapa.org.

Data dashboard featuring this data: Napa County Demographics https://data.countyofnapa.org/stories/s/bu3n-fytj

How was the data collected? Population projections use the following demographic balancing equation: Current Population = Previous Population + (Births - Deaths) +Net Migration

Previous Population: the starting point for the population projection estimates is the 2020 US Census, informed by the Population Estimates Program data.

Births and Deaths: birth and death totals came from the California Department of Public Health, Vital Statistics Branch, which maintains birth and death records for California.

Net Migration: multiple sources of administrative records were used to estimate net migration, including driver’s license address changes, IRS tax return data, Medicare and Medi-Cal enrollment, federal immigration reports, elementary school enrollments, and group quarters population.

Who was included and excluded from the data? Previous Population: The goal of the US Census is to reflect all populations residing in a given geographic area. Results of two analyses done by the US Census Bureau showed that the 2020 Census total population counts were consistent with recent counts despite the challenges added by the pandemic. However, some populations were undercounted (the Black or African American population, the American Indian or Alaska Native population living on a reservation, the Hispanic or Latino population, and people who reported being of Some Other Race), and some were overcounted (the Non-Hispanic White population and the Asian population). Children, especially children younger than 4, were also undercounted.

Births and Deaths: Birth records include all people who are born in California as well as births to California residents that happened out of state. Death records include people who died while in California, as well as deaths of California residents that occurred out of state. Because birth and death record data comes from a registration process, the demographic information provided may not be accurate or complete.

Net Migration: each of the multiple sources of administrative records that were used to estimate net migration include and exclude different groups. For details about methodology, see https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf.

Where was the data collected?  Data is collected throughout California. This subset of data includes Napa County.

When was the data collected? This subset of Napa County data is from Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.

These 2019 baseline projections incorporate the latest historical population, birth, death, and migration data available as of July 1, 2020. Historical trends from 1990 through 2020 for births, deaths, and migration are examined. County populations by age, sex, and race/ethnicity are projected to 2060.

Why was the data collected?  The population projections were prepared under the mandate of the California Government Code (Cal. Gov't Code § 13073, 13073.5).

Where can I learn more about this data? https://dof.ca.gov/Forecasting/Demographics/Projections/ https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/P3_Dictionary.txt https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf
f
Statistics of the predictive performance indicators of the two models.
plos.figshare.com
xls
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jin Wang; Shihan Ma; Qing Lv; Qiang Li (2025). Statistics of the predictive performance indicators of the two models. [Dataset]. http://doi.org/10.1371/journal.pone.0320298.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0320298.t004
Dataset updated
Jun 25, 2025
Dataset provided by
PLOS ONE
Authors
Jin Wang; Shihan Ma; Qing Lv; Qiang Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics of the predictive performance indicators of the two models.
📣 Ad Click Prediction Dataset
kaggle.com
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ciobanu Marius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About

This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

Features

id: Unique identifier for each user.

full_name: User's name formatted as "UserX" for anonymity.

age: Age of the user (ranging from 18 to 64 years).

gender: The gender of the user (categorized as Male, Female, or Non-Binary).

device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).

ad_position: The position of the ad on the webpage (Top, Side, Bottom).

browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).

time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).

click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

Goal

The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.
n
Data from: Predictive modeling for clinical features associated with...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Mar 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip Payne; Stephanie Morris; Aditi Gupta; Seunghwan Kim; Randi Foraker; David Gutmann (2022). Predictive modeling for clinical features associated with Neurofibromatosis Type 1 [Dataset]. http://doi.org/10.5061/dryad.nvx0k6drn
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.nvx0k6drn
Dataset updated
Mar 10, 2022
Dataset provided by
Washington University in St. Louis
Authors
Philip Payne; Stephanie Morris; Aditi Gupta; Seunghwan Kim; Randi Foraker; David Gutmann
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: Perform a longitudinal analysis of clinical features associated with Neurofibromatosis Type 1 (NF1) based on demographic and clinical characteristics, and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.

Methods: Using NF1 as a model system, we perform retrospective data analyses utilizing a manually-curated NF1 clinical registry and electronic health record (EHR) information, and develop machine-learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.

Results: Males and females were evenly represented in the cohort. White children were more likely to develop OPG (OR: 2.11, 95%CI: 1.11-4.00, p=0.02) relative to their non-white peers. Median age at diagnosis of OPG was 6.5 years (1.7-17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95%CI: 1.33-2.70, p<0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an AUROC of 0.74, and predicted diagnosis of OPG with an AUROC of 0.82.

Conclusions: Using readily available clinical and EHR data, we successfully recapitulated several important and clinically-relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naïve machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.

Methods Patients and Data Description

This study was performed using retrospective clinical data extracted from two sources within the Washington University Neurofibromatosis (NF) Center. First, data were extracted from an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database had a clinical diagnosis of NF1 based on current National Institutes of Health Consensus Development Conference diagnostic criteria,9 and had been assessed over multiple visits from 2002 to 2016 for the presence of clinical features associated with NF1. Data points in this registry included demographic information, such as age, race, and sex, in addition to NF1-related clinical features and associated conditions, such as café-au-lait macules, skinfold freckling, cutaneous neurofibromas, Lisch nodules, OPG, hypertension, ADHD, and cognitive impairment. These data were maintained in a semi-structured format containing textual and binary fields, capturing each individual’s data over multiple clinical visits. From these data, clinical features and phenotypes were extracted using data manipulation, imputation, and text mining techniques. Data obtained from this NF1 clinical registry were converted to data tables, which captured each patient visit and the presence/absence of specific clinical features at each visit. Clinical features which were once marked as present were assumed to be present for all future visits, and missing data were assumed absent for that specific visit. Categorical variables are reported as frequencies and proportions, and compared using odds ratios (ORs). Continuously distributed traits, adhering to both conventional normality assumptions and homogeneity of variances, are reported as mean and standard deviations, and compared using analysis of variance methods. Non-parametric equivalents were used for data with non-normative distributions.

Clinical Feature Extraction from Clinical Registry and EHR

The NF1 Clinical Registry comprised string-based clinical feature values, such as ADHD, OPG, and asthma. From these data, we extracted 27 unique clinical features in addition to longitudinal data on the development of NF1-related clinical features and associated diagnoses. For each clinical feature, age at initial presentation and/or diagnosis was computed, and median age of occurrence was calculated for each sex. The exact age of presentation and/or diagnosis could not be definitively ascertained for any feature that was present at a child’s initial clinic visit. As such, we computed the age of diagnosis only for those clinical features for which we have at least one visit documenting feature absence prior to the manifestation of that feature.

Diagnosis codes from the EHR-derived data set were also extracted. Diagnosis codes were recorded as 15,890 unique ICD 9/10 codes. Given the large number of ICD 9/10 codes, a consistent, concept-level “roll up” of relevant codes to a single phenotype description was created by mapping the extracted ICD 9/10 values to phenome-wide association (PheWAS) codes called Phecodes, which have been demonstrated to better align with clinical disease compared to individual ICD codes.

Machine Learning Analyses

Using a combination of clinical features obtained from the NF1 Clinical Registry and EHR-derived data sets, we developed prediction models using a gradient boosting platform for identifying patients with specific NF1-related diagnoses to establish the usefulness of clinical history and documentation of clinical findings in predicting phenotypic variability of NF1. Initial analyses used a state-of-the-art classification algorithm, gradient boosting model, which uses a tree-based algorithm to produce a predictive model from an ensemble of weak predictive models. Gradient boosting model was selected, as it supports identifying importance of features used in the final prediction model. Subsequent analyses employed training each model for three different feature sets: (1) demographic features for all patients, including race, sex, and family history of NF1 [5 features]; (2) clinical features associated with NF1 [27 features] extracted from the NF1 Clinical Registry; and (3) diagnosis codes extracted from the EHR data, which were reduced to 50 Phecodes. Four-fold cross validation was then applied for the three models, and comparisons for the prediction accuracies of each model determined. Positive predictive value (PPV), F1 score and the area under the receiver operator characteristic (AUROC) curve were used as evaluation metrics. Scikit Learn, a machine learning library in Python, was employed to implement all analyses.

Standard Protocol Approvals, Registrations, and Patient Consents

The NF1 Clinical Registry is an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database have a clinical diagnosis of NF1 based on current National Institutes of Health criteria and have provided informed consent for participation in the clinical registry. All data collection, usage and analysis for this study were approved by the Institutional Review Board (IRB) at the Washington University School of Medicine.
D
Data Analytics in L & H Insurance Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Analytics in L & H Insurance Report [Dataset]. https://www.datainsightsmarket.com/reports/data-analytics-in-l-h-insurance-1430368
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Life and Health (L&H) Insurance industry is experiencing a rapid transformation driven by the increasing adoption of data analytics. The market, valued at $2647.3 million in 2025, is projected to grow at a Compound Annual Growth Rate (CAGR) of 9.2% from 2025 to 2033. This robust growth is fueled by several key factors. Firstly, the need for improved risk assessment and underwriting is pushing insurers to leverage advanced analytics for predictive modeling. This allows for more accurate pricing, reduced fraud, and better customer segmentation. Secondly, demographic profiling enabled by data analytics helps insurers tailor products and services to specific customer needs, leading to increased customer satisfaction and retention. Data visualization tools further enhance decision-making by providing clear and concise insights into complex datasets, facilitating better strategy development and operational efficiency. Finally, the rise of Insurtech companies and the increasing availability of sophisticated software solutions are accelerating the adoption of data analytics across the L&H insurance sector. The competitive landscape is shaped by a mix of established players like Deloitte, SAP AG, and IBM, alongside specialized Insurtech firms offering innovative data analytics solutions. The segmentation of the market reveals significant opportunities across various applications and types. Predictive analysis, demographic profiling, and data visualization are the most prominent application segments, reflecting the industry's focus on risk management, customer understanding, and improved operational efficiency. The service and software segments represent the primary delivery models for data analytics solutions. While North America currently holds a dominant market share, regions like Asia-Pacific are experiencing rapid growth, driven by increasing digitalization and a rising middle class with growing insurance needs. Regulatory changes promoting data sharing and increased customer data privacy awareness are likely to influence market dynamics in the coming years. The key challenges include data security concerns, the need for skilled data scientists, and the integration of legacy systems with new data analytics platforms. Successfully navigating these challenges will be crucial for insurers to fully capitalize on the transformative potential of data analytics.
d
Woods & Poole Complete US Database
search.dataone.org
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Woods & Poole (2024). Woods & Poole Complete US Database [Dataset]. http://doi.org/10.7910/DVN/ZCPMU6
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ZCPMU6
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Woods & Poole
Time period covered
Jan 1, 1970 - Jan 1, 2050
Description
The 2018 edition of Woods and Poole Complete U.S. Database provides annual historical data from 1970 (some variables begin in 1990) and annual projections to 2050 of population by race, sex, and age, employment by industry, earnings of employees by industry, personal income by source, households by income bracket and retail sales by kind of business. The Complete U.S. Database contains annual data for all economic and demographic variables for all geographic areas in the Woods & Poole database (the U.S. total, and all regions, states, counties, and CBSAs). The Complete U.S. Database has following components: Demographic & Economic Desktop Data Files: There are 122 files covering demographic and economic data. The first 31 files (WP001.csv – WP031.csv) cover demographic data. The remaining files (WP032.csv – WP122.csv) cover economic data. Demographic DDFs: Provide population data for the U.S., regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), and counties. Each variable is in a separate .csv file. Variables: Total Population Population Age (breakdown: 0-4, 5-9, 10-15 etc. all the way to 85 & over) Median Age of Population White Population Population Native American Population Asian & Pacific Islander Population Hispanic Population, any Race Total Population Age (breakdown: 0-17, 15-17, 18-24, 65 & over) Male Population Female Population Economic DDFs: The other files (WP032.csv – WP122.csv) provide employment and income data on: Total Employment (by industry) Total Earnings of Employees (by industry) Total Personal Income (by source) Household income (by brackets) Total Retail & Food Services Sales ( by industry) Net Earnings Gross Regional Product Retail Sales per Household Economic & Demographic Flat File: A single file for total number of people by single year of age (from 0 to 85 and over), race, and gender. It covers all U.S., regions, states, CSAs, MSAs and counties. Years of coverage: 1990 - 2050 Single Year of Age by Race and Gender: Separate files for number of people by single year of age (from 0 years to 85 years and over), race (White, Black, Native American, Asian American & Pacific Islander and Hispanic) and gender. Years of coverage: 1990 through 2050. DATA AVAILABLE FOR 1970-2019; FORECASTS THROUGH 2050
c
Data from: Diabetes Prediction Dataset
cubig.ai
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Diabetes Prediction Dataset [Dataset]. https://cubig.ai/store/products/489/diabetes-prediction-dataset
Explore at:
Dataset updated
Jun 22, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Diabetes Prediction Dataset is a dataset built for the purpose of predicting diabetes and analyzing related risk factors. It contains various characteristics such as demographics, lifestyle, and clinical measurements, so it can be used to predict a patient's risk of developing diabetes.

2) Data Utilization (1) Diabetes Prediction Dataset has characteristics that: • Key columns (characteristics) include a variety of clinical and lifestyle indicators related to diabetes, including age, gender, body mass index (BMI), blood pressure, blood sugar levels (Glucose), insulin, family history, and physical activity. (2) Diabetes Prediction Dataset can be used to: • Machine Learning/Deep Learning Model Development: It can be used to develop classification models (logistic regression, decision tree, random forest, neural network, etc.) that predict the risk of developing diabetes based on patient characteristics. • Data Analysis and Visualization: It is suitable for correlation analysis, risk factor derivation, Exploratory Data Analysis (EDA) and many other variables such as demographics, clinical figures, lifestyle, and more.
A
‘Statistical Forecasting Demographic Projection Report - Enrollment...
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-statistical-forecasting-demographic-projection-report-enrollment-projections-new-york-city-public-schools-46f8/9554f28f/?iid=004-863&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York
Description
Analysis of ‘Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/cf070156-0e60-46d0-a31d-0b34d9b0dcd4 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

Demografy (2021). Demografy's Consumer Demographics Prediction SaaS [Dataset]. https://datarade.ai/data-products/demografy-s-consumer-demographics-prediction-saas-demografy

Demografy's Consumer Demographics Prediction SaaS

Explore at:

.json, .csvAvailable download formats

Dataset updated

Jun 4, 2021

Dataset authored and provided by

Demografy

Area covered

Moldova (Republic of), Italy, Czech Republic, Croatia, Luxembourg, Monaco, Sweden, Finland, Poland, Denmark

Description

Demografy is a privacy by design customer demographics prediction AI platform.

Core features: - Demographic segmentation - Demographic analytics - API integration - Data export

Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names

Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better

Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.

Clear search

Close search

Google apps

Main menu

Demografy's Consumer Demographics Prediction SaaS

Is Demography Destiny? Application of Machine Learning Techniques to...

Global predictive analytics market value 2020 and 2028

Statistical Forecasting Demographic Projection Report - Enrollment...

Predictive analytics market forecast worldwide 2016-2022

County-level Socioeconomic Data for Predictive Modeling of Epidemiological...

Statistics of the predictive performance indicators of three different...

Adult Income Prediction Classification

Data from: Model choice for phylogeographic inference using a large set of...

Healthcare Appointment No-Show Prediction

Complete Economic and Demographic Data Source (CEDDS) 2023

Demographic Projection Report - Enrollment Projections - New York City...

Population Projections for Napa County

Statistics of the predictive performance indicators of the two models.

📣 Ad Click Prediction Dataset

Data from: Predictive modeling for clinical features associated with...

Data Analytics in L & H Insurance Report

Woods & Poole Complete US Database

Data from: Diabetes Prediction Dataset

‘Statistical Forecasting Demographic Projection Report - Enrollment...

Demografy's Consumer Demographics Prediction SaaSSee More Versions

Demografy's Consumer Demographics Prediction SaaS