100+ datasets found
  1. d

    Demografy's Consumer Demographics Prediction SaaS

    • datarade.ai
    .json, .csv
    Updated Jun 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Demografy (2021). Demografy's Consumer Demographics Prediction SaaS [Dataset]. https://datarade.ai/data-products/demografy-s-consumer-demographics-prediction-saas-demografy
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jun 4, 2021
    Dataset authored and provided by
    Demografy
    Area covered
    Moldova (Republic of), Italy, Czech Republic, Croatia, Luxembourg, Monaco, Sweden, Finland, Poland, Denmark
    Description

    Demografy is a privacy by design customer demographics prediction AI platform.

    Core features: - Demographic segmentation - Demographic analytics - API integration - Data export

    Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names

    Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better

    Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.

  2. f

    Is Demography Destiny? Application of Machine Learning Techniques to...

    • plos.figshare.com
    • figshare.com
    docx
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Luo; Thin Nguyen; Melanie Nichols; Truyen Tran; Santu Rana; Sunil Gupta; Dinh Phung; Svetha Venkatesh; Steve Allender (2023). Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset [Dataset]. http://doi.org/10.1371/journal.pone.0125602
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Wei Luo; Thin Nguyen; Melanie Nichols; Truyen Tran; Santu Rana; Sunil Gupta; Dinh Phung; Svetha Venkatesh; Steve Allender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.

  3. Global predictive analytics market value 2020 and 2028

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global predictive analytics market value 2020 and 2028 [Dataset]. https://www.statista.com/statistics/1286871/predictive-analytics-market-size/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2020
    Area covered
    Worldwide
    Description

    The market for predictive analytics software was valued at **** billion U.S. dollars in 2020 and is forecasted to grow to ***** billion U.S. dollars by 2028. Predictive analytics are often used to analyze consumer behavior, and manage supply chains and business operations.

  4. c

    Statistical Forecasting Demographic Projection Report - Enrollment...

    • s.cnmilf.com
    • data.cityofnewyork.us
    • +3more
    Updated Sep 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2023). Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/statistical-forecasting-demographic-projection-report-enrollment-projections-new-york-city
    Explore at:
    Dataset updated
    Sep 2, 2023
    Dataset provided by
    data.cityofnewyork.us
    Area covered
    New York
    Description

    Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.

  5. Predictive analytics market forecast worldwide 2016-2022

    • statista.com
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Predictive analytics market forecast worldwide 2016-2022 [Dataset]. https://www.statista.com/statistics/819415/worldwide-predictive-analytics-market-size/
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    Worldwide
    Description

    As of 2019, forecasts suggest that the predictive analytics market will reach over *********** U.S. dollars in total revenue. By 2022 the market is expected to reach nearly ** billion dollars in annual revenue as an increasingly large number of businesses make use of predictive analytics techniques for everything from fraud detection to medical diagnosis. Predictive analytics The field of predictive analytics involves the use of various statistical methods and models within businesses to make predictions about a wide range of future outcomes. Predictive analytical analysis is already one of the most widely adopted intelligent automation technologies in the world, with over ** percent of major enterprises deploying smart analytics that include predictive analytics. As business interactions around the world become increasingly digitalized, massive amounts of data are created which can be evaluated through predictive analytics tools in order to give users a better understanding of market dynamics and underlying trends. Considering this, it is no surprise that predictive models rank as the one of the top big data technology trends around the world.

  6. M

    County-level Socioeconomic Data for Predictive Modeling of Epidemiological...

    • catalog.midasnetwork.us
    csv for excel
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIDAS Coordination Center (2024). County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects [Dataset]. https://catalog.midasnetwork.us/collection/19
    Explore at:
    csv for excelAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    MIDAS Coordination Center
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Time period covered
    Jan 22, 2020 - Sep 13, 2020
    Variables measured
    disease, COVID-19, behavior, pathogen, case counts, Homo sapiens, host organism, age-stratified, mortality data, phenotypic sex, and 13 more
    Dataset funded by
    National Institute of General Medical Sciences
    Description

    The repository contains machine readable dataset aggregating relevant data from around 10 governmental and academic sources on the county-level for each county in the 50 states and in Washington D.C. and it included data on counties, demographics, socioeconomics, healthcare, education data for each county in the 50 states and D.C. In addition to county-level time series from the JHU CSSE COVID-19 dashboard (https://github.com/CSSEGISandData/COVID-19), the dataset contains multiple variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics in CSV formats.

  7. Statistics of the predictive performance indicators of three different...

    • plos.figshare.com
    xls
    Updated Jun 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jin Wang; Shihan Ma; Qing Lv; Qiang Li (2025). Statistics of the predictive performance indicators of three different models. [Dataset]. http://doi.org/10.1371/journal.pone.0320298.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jin Wang; Shihan Ma; Qing Lv; Qiang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of the predictive performance indicators of three different models.

  8. Adult Income Prediction Classification

    • kaggle.com
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sathyam A (2024). Adult Income Prediction Classification [Dataset]. https://www.kaggle.com/datasets/isathyam31/adult-income-prediction-classification
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    Kaggle
    Authors
    Sathyam A
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains information about adult income prediction. It includes the following columns:

    workclass: The type of employment (e.g., Private, Self-emp-not-inc, Federal-gov, Local-gov) fnlwgt: The number of people the census believes the entry represents education: The highest level of education achieved education-num: The numeric representation of the previous column marital-status: The marital status of the individual occupation: The occupation of the individual relationship: The relationship of the individual to their household race: The race of the individual sex: The gender of the individual capital-gain: The capital gains of the individual capital-loss: The capital losses of the individual hours-per-week: The number of hours the individual works per week country: The native country of the individual salary: The income level of the individual, which is the target variable to predict.

    The goal of this dataset is to build a model that can accurately predict the income level of an individual based on the provided features.

  9. d

    Data from: Model choice for phylogeographic inference using a large set of...

    • search.dataone.org
    • zenodo.org
    • +1more
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tara A. Pelletier; Bryan C. Carstens (2025). Model choice for phylogeographic inference using a large set of models [Dataset]. http://doi.org/10.5061/dryad.8kq65
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Tara A. Pelletier; Bryan C. Carstens
    Time period covered
    Jan 1, 2014
    Description

    Model-based analyses are common in phylogeographic inference because they parameterize processes such as population division, gene flow and expansion that are of interest to biologists. Approximate Bayesian Computation is a model-based approach that can be customized to any empirical system and used to calculate the relative posterior probability of several models, provided that suitable models can be identified for comparison. The question of how to identify suitable models is explored using data from Plethodon idahoensis, a salamander that inhabits the North American inland northwest temperate rainforest. First, we conduct an ABC analysis using five models suggested by previous research, calculate the relative posterior probabilities, and find that a simple model of population isolation has the best fit to the data (PP = 0.70). In contrast to this subjective choice of models to include in the analysis, we also specify models in a more objective manner by simulating prior distributions...

  10. G

    Healthcare Appointment No-Show Prediction

    • gomask.ai
    csv
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Healthcare Appointment No-Show Prediction [Dataset]. https://gomask.ai/marketplace/datasets/healthcare-appointment-no-show-prediction
    Explore at:
    csv(Unknown)Available download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    GoMask.ai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    no_show, clinic_id, department, patient_id, clinic_name, patient_age, provider_id, address_city, has_diabetes, scheduled_by, and 27 more
    Description

    This dataset provides detailed logs of healthcare appointment bookings, enriched with patient demographics, medical history, and communication records such as reminders. It enables comprehensive analysis of no-show risk factors, supports predictive modeling, and helps optimize scheduling efficiency in clinical settings.

  11. Complete Economic and Demographic Data Source (CEDDS) 2023

    • aura.american.edu
    Updated Feb 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Woods & Poole Economics, Inc. (2025). Complete Economic and Demographic Data Source (CEDDS) 2023 [Dataset]. http://doi.org/10.57912/23561643.v1
    Explore at:
    Dataset updated
    Feb 10, 2025
    Dataset provided by
    Authors
    Woods & Poole Economics, Inc.
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    The Woods & Poole Economics, Inc. 2023 Complete Economic and Demographic Data Source contains some of the Woods & Poole Economics, Inc. regional data and projections for the U.S. and all regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), Designated Market Areas (DMAs), and counties for 1969 or 1970 or 1990 through 2060. The remainder of this introduction contains the technical description of the and Download. Chapter 1 is an overview of the 2023 projections. Please read "Technical Description of the 2023 Regional Projections and Database" (Chapter 2) for an explanation of data sources, data definitions, and forecast methods. Appendices to Chapter 2 define the geographic areas used by Woods & Poole.

  12. d

    Demographic Projection Report - Enrollment Projections - New York City...

    • catalog.data.gov
    • data.cityofnewyork.us
    • +2more
    Updated Feb 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). Demographic Projection Report - Enrollment Projections - New York City Public Schools prepared by Statistical Forecasting [Dataset]. https://catalog.data.gov/dataset/demographic-projection-report-enrollment-projections-new-york-city-public-schools-prepared
    Explore at:
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    data.cityofnewyork.us
    Area covered
    New York
    Description

    The SCA’s comprehensive capital planning process includes developing and analyzing quality data, creating and updating the Department of Education’s Five-Year Capital Plans, and monitoring projects through completion. The SCA prioritizes capital projects to best meet the capacity and building improvements needs throughout the City. Additionally, the SCA assures that the Capital Plan aligns with New York State and City Department of Education mandates, academic initiatives, and budgetary resources. This is one of the most current published reports.

  13. Population Projections for Napa County

    • data.countyofnapa.org
    application/rdfxml +5
    Updated Aug 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Finance (2023). Population Projections for Napa County [Dataset]. https://data.countyofnapa.org/w/sjku-zj9t/default?cur=5lvCEgbTfgE&from=i57KEYaw4ON
    Explore at:
    json, csv, xml, application/rssxml, application/rdfxml, tsvAvailable download formats
    Dataset updated
    Aug 10, 2023
    Dataset authored and provided by
    California Department of Financehttps://dof.ca.gov/
    Area covered
    Napa County
    Description

    Data Source: CA Department of Finance, Demographic Research Unit

    Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.

    This data biography shares the how, who, what, where, when, and why about this dataset. We, the epidemiology team at Napa County Health and Human Services Agency, Public Health Division, created it to help you understand where the data we analyze and share comes from. If you have any further questions, we can be reached at epidemiology@countyofnapa.org.

    Data dashboard featuring this data: Napa County Demographics https://data.countyofnapa.org/stories/s/bu3n-fytj

    How was the data collected? Population projections use the following demographic balancing equation: Current Population = Previous Population + (Births - Deaths) +Net Migration

    Previous Population: the starting point for the population projection estimates is the 2020 US Census, informed by the Population Estimates Program data.

    Births and Deaths: birth and death totals came from the California Department of Public Health, Vital Statistics Branch, which maintains birth and death records for California.

    Net Migration: multiple sources of administrative records were used to estimate net migration, including driver’s license address changes, IRS tax return data, Medicare and Medi-Cal enrollment, federal immigration reports, elementary school enrollments, and group quarters population.

    Who was included and excluded from the data? Previous Population: The goal of the US Census is to reflect all populations residing in a given geographic area. Results of two analyses done by the US Census Bureau showed that the 2020 Census total population counts were consistent with recent counts despite the challenges added by the pandemic. However, some populations were undercounted (the Black or African American population, the American Indian or Alaska Native population living on a reservation, the Hispanic or Latino population, and people who reported being of Some Other Race), and some were overcounted (the Non-Hispanic White population and the Asian population). Children, especially children younger than 4, were also undercounted.

    Births and Deaths: Birth records include all people who are born in California as well as births to California residents that happened out of state. Death records include people who died while in California, as well as deaths of California residents that occurred out of state. Because birth and death record data comes from a registration process, the demographic information provided may not be accurate or complete.

    Net Migration: each of the multiple sources of administrative records that were used to estimate net migration include and exclude different groups. For details about methodology, see https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf.

    Where was the data collected?  Data is collected throughout California. This subset of data includes Napa County.

    When was the data collected? This subset of Napa County data is from Report P-3: Population Projections, California, 2010-2060 (Baseline 2019 Population Projections; Vintage 2020 Release). Sacramento: California. July 2021.

    These 2019 baseline projections incorporate the latest historical population, birth, death, and migration data available as of July 1, 2020. Historical trends from 1990 through 2020 for births, deaths, and migration are examined. County populations by age, sex, and race/ethnicity are projected to 2060.

    Why was the data collected?  The population projections were prepared under the mandate of the California Government Code (Cal. Gov't Code § 13073, 13073.5).

    Where can I learn more about this data? https://dof.ca.gov/Forecasting/Demographics/Projections/ https://dof.ca.gov/wp-content/uploads/sites/352/Forecasting/Demographics/Documents/P3_Dictionary.txt https://dof.ca.gov/wp-content/uploads/sites/352/2023/07/Projections_Methodology.pdf

  14. f

    Statistics of the predictive performance indicators of the two models.

    • plos.figshare.com
    xls
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jin Wang; Shihan Ma; Qing Lv; Qiang Li (2025). Statistics of the predictive performance indicators of the two models. [Dataset]. http://doi.org/10.1371/journal.pone.0320298.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Jin Wang; Shihan Ma; Qing Lv; Qiang Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of the predictive performance indicators of the two models.

  15. 📣 Ad Click Prediction Dataset

    • kaggle.com
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ciobanu Marius (2024). 📣 Ad Click Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/marius2303/ad-click-prediction-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ciobanu Marius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About

    This dataset provides insights into user behavior and online advertising, specifically focusing on predicting whether a user will click on an online advertisement. It contains user demographic information, browsing habits, and details related to the display of the advertisement. This dataset is ideal for building binary classification models to predict user interactions with online ads.

    Features

    • id: Unique identifier for each user.
    • full_name: User's name formatted as "UserX" for anonymity.
    • age: Age of the user (ranging from 18 to 64 years).
    • gender: The gender of the user (categorized as Male, Female, or Non-Binary).
    • device_type: The type of device used by the user when viewing the ad (Mobile, Desktop, Tablet).
    • ad_position: The position of the ad on the webpage (Top, Side, Bottom).
    • browsing_history: The user's browsing activity prior to seeing the ad (Shopping, News, Entertainment, Education, Social Media).
    • time_of_day: The time when the user viewed the ad (Morning, Afternoon, Evening, Night).
    • click: The target label indicating whether the user clicked on the ad (1 for a click, 0 for no click).

    Goal

    The objective of this dataset is to predict whether a user will click on an online ad based on their demographics, browsing behavior, the context of the ad's display, and the time of day. You will need to clean the data, understand it and then apply machine learning models to predict and evaluate data. It is a really challenging request for this kind of data. This data can be used to improve ad targeting strategies, optimize ad placement, and better understand user interaction with online advertisements.

  16. n

    Data from: Predictive modeling for clinical features associated with...

    • data.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Mar 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philip Payne; Stephanie Morris; Aditi Gupta; Seunghwan Kim; Randi Foraker; David Gutmann (2022). Predictive modeling for clinical features associated with Neurofibromatosis Type 1 [Dataset]. http://doi.org/10.5061/dryad.nvx0k6drn
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 10, 2022
    Dataset provided by
    Washington University in St. Louis
    Authors
    Philip Payne; Stephanie Morris; Aditi Gupta; Seunghwan Kim; Randi Foraker; David Gutmann
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: Perform a longitudinal analysis of clinical features associated with Neurofibromatosis Type 1 (NF1) based on demographic and clinical characteristics, and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.

    Methods: Using NF1 as a model system, we perform retrospective data analyses utilizing a manually-curated NF1 clinical registry and electronic health record (EHR) information, and develop machine-learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.

    Results: Males and females were evenly represented in the cohort. White children were more likely to develop OPG (OR: 2.11, 95%CI: 1.11-4.00, p=0.02) relative to their non-white peers. Median age at diagnosis of OPG was 6.5 years (1.7-17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95%CI: 1.33-2.70, p<0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an AUROC of 0.74, and predicted diagnosis of OPG with an AUROC of 0.82.

    Conclusions: Using readily available clinical and EHR data, we successfully recapitulated several important and clinically-relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naïve machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.

    Methods Patients and Data Description

    This study was performed using retrospective clinical data extracted from two sources within the Washington University Neurofibromatosis (NF) Center. First, data were extracted from an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database had a clinical diagnosis of NF1 based on current National Institutes of Health Consensus Development Conference diagnostic criteria,9 and had been assessed over multiple visits from 2002 to 2016 for the presence of clinical features associated with NF1. Data points in this registry included demographic information, such as age, race, and sex, in addition to NF1-related clinical features and associated conditions, such as café-au-lait macules, skinfold freckling, cutaneous neurofibromas, Lisch nodules, OPG, hypertension, ADHD, and cognitive impairment. These data were maintained in a semi-structured format containing textual and binary fields, capturing each individual’s data over multiple clinical visits. From these data, clinical features and phenotypes were extracted using data manipulation, imputation, and text mining techniques. Data obtained from this NF1 clinical registry were converted to data tables, which captured each patient visit and the presence/absence of specific clinical features at each visit. Clinical features which were once marked as present were assumed to be present for all future visits, and missing data were assumed absent for that specific visit. Categorical variables are reported as frequencies and proportions, and compared using odds ratios (ORs). Continuously distributed traits, adhering to both conventional normality assumptions and homogeneity of variances, are reported as mean and standard deviations, and compared using analysis of variance methods. Non-parametric equivalents were used for data with non-normative distributions.

    Clinical Feature Extraction from Clinical Registry and EHR

    The NF1 Clinical Registry comprised string-based clinical feature values, such as ADHD, OPG, and asthma. From these data, we extracted 27 unique clinical features in addition to longitudinal data on the development of NF1-related clinical features and associated diagnoses. For each clinical feature, age at initial presentation and/or diagnosis was computed, and median age of occurrence was calculated for each sex. The exact age of presentation and/or diagnosis could not be definitively ascertained for any feature that was present at a child’s initial clinic visit. As such, we computed the age of diagnosis only for those clinical features for which we have at least one visit documenting feature absence prior to the manifestation of that feature.

    Diagnosis codes from the EHR-derived data set were also extracted. Diagnosis codes were recorded as 15,890 unique ICD 9/10 codes. Given the large number of ICD 9/10 codes, a consistent, concept-level “roll up” of relevant codes to a single phenotype description was created by mapping the extracted ICD 9/10 values to phenome-wide association (PheWAS) codes called Phecodes, which have been demonstrated to better align with clinical disease compared to individual ICD codes.

    Machine Learning Analyses

    Using a combination of clinical features obtained from the NF1 Clinical Registry and EHR-derived data sets, we developed prediction models using a gradient boosting platform for identifying patients with specific NF1-related diagnoses to establish the usefulness of clinical history and documentation of clinical findings in predicting phenotypic variability of NF1. Initial analyses used a state-of-the-art classification algorithm, gradient boosting model, which uses a tree-based algorithm to produce a predictive model from an ensemble of weak predictive models. Gradient boosting model was selected, as it supports identifying importance of features used in the final prediction model. Subsequent analyses employed training each model for three different feature sets: (1) demographic features for all patients, including race, sex, and family history of NF1 [5 features]; (2) clinical features associated with NF1 [27 features] extracted from the NF1 Clinical Registry; and (3) diagnosis codes extracted from the EHR data, which were reduced to 50 Phecodes. Four-fold cross validation was then applied for the three models, and comparisons for the prediction accuracies of each model determined. Positive predictive value (PPV), F1 score and the area under the receiver operator characteristic (AUROC) curve were used as evaluation metrics. Scikit Learn, a machine learning library in Python, was employed to implement all analyses.

    Standard Protocol Approvals, Registrations, and Patient Consents

    The NF1 Clinical Registry is an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children’s Hospital. All individuals included in this database have a clinical diagnosis of NF1 based on current National Institutes of Health criteria and have provided informed consent for participation in the clinical registry. All data collection, usage and analysis for this study were approved by the Institutional Review Board (IRB) at the Washington University School of Medicine.

  17. D

    Data Analytics in L & H Insurance Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Analytics in L & H Insurance Report [Dataset]. https://www.datainsightsmarket.com/reports/data-analytics-in-l-h-insurance-1430368
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Life and Health (L&H) Insurance industry is experiencing a rapid transformation driven by the increasing adoption of data analytics. The market, valued at $2647.3 million in 2025, is projected to grow at a Compound Annual Growth Rate (CAGR) of 9.2% from 2025 to 2033. This robust growth is fueled by several key factors. Firstly, the need for improved risk assessment and underwriting is pushing insurers to leverage advanced analytics for predictive modeling. This allows for more accurate pricing, reduced fraud, and better customer segmentation. Secondly, demographic profiling enabled by data analytics helps insurers tailor products and services to specific customer needs, leading to increased customer satisfaction and retention. Data visualization tools further enhance decision-making by providing clear and concise insights into complex datasets, facilitating better strategy development and operational efficiency. Finally, the rise of Insurtech companies and the increasing availability of sophisticated software solutions are accelerating the adoption of data analytics across the L&H insurance sector. The competitive landscape is shaped by a mix of established players like Deloitte, SAP AG, and IBM, alongside specialized Insurtech firms offering innovative data analytics solutions. The segmentation of the market reveals significant opportunities across various applications and types. Predictive analysis, demographic profiling, and data visualization are the most prominent application segments, reflecting the industry's focus on risk management, customer understanding, and improved operational efficiency. The service and software segments represent the primary delivery models for data analytics solutions. While North America currently holds a dominant market share, regions like Asia-Pacific are experiencing rapid growth, driven by increasing digitalization and a rising middle class with growing insurance needs. Regulatory changes promoting data sharing and increased customer data privacy awareness are likely to influence market dynamics in the coming years. The key challenges include data security concerns, the need for skilled data scientists, and the integration of legacy systems with new data analytics platforms. Successfully navigating these challenges will be crucial for insurers to fully capitalize on the transformative potential of data analytics.

  18. d

    Woods & Poole Complete US Database

    • search.dataone.org
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Woods & Poole (2024). Woods & Poole Complete US Database [Dataset]. http://doi.org/10.7910/DVN/ZCPMU6
    Explore at:
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Woods & Poole
    Time period covered
    Jan 1, 1970 - Jan 1, 2050
    Description

    The 2018 edition of Woods and Poole Complete U.S. Database provides annual historical data from 1970 (some variables begin in 1990) and annual projections to 2050 of population by race, sex, and age, employment by industry, earnings of employees by industry, personal income by source, households by income bracket and retail sales by kind of business. The Complete U.S. Database contains annual data for all economic and demographic variables for all geographic areas in the Woods & Poole database (the U.S. total, and all regions, states, counties, and CBSAs). The Complete U.S. Database has following components: Demographic & Economic Desktop Data Files: There are 122 files covering demographic and economic data. The first 31 files (WP001.csv – WP031.csv) cover demographic data. The remaining files (WP032.csv – WP122.csv) cover economic data. Demographic DDFs: Provide population data for the U.S., regions, states, Combined Statistical Areas (CSAs), Metropolitan Statistical Areas (MSAs), Micropolitan Statistical Areas (MICROs), Metropolitan Divisions (MDIVs), and counties. Each variable is in a separate .csv file. Variables: Total Population Population Age (breakdown: 0-4, 5-9, 10-15 etc. all the way to 85 & over) Median Age of Population White Population Population Native American Population Asian & Pacific Islander Population Hispanic Population, any Race Total Population Age (breakdown: 0-17, 15-17, 18-24, 65 & over) Male Population Female Population Economic DDFs: The other files (WP032.csv – WP122.csv) provide employment and income data on: Total Employment (by industry) Total Earnings of Employees (by industry) Total Personal Income (by source) Household income (by brackets) Total Retail & Food Services Sales ( by industry) Net Earnings Gross Regional Product Retail Sales per Household Economic & Demographic Flat File: A single file for total number of people by single year of age (from 0 to 85 and over), race, and gender. It covers all U.S., regions, states, CSAs, MSAs and counties. Years of coverage: 1990 - 2050 Single Year of Age by Race and Gender: Separate files for number of people by single year of age (from 0 years to 85 years and over), race (White, Black, Native American, Asian American & Pacific Islander and Hispanic) and gender. Years of coverage: 1990 through 2050. DATA AVAILABLE FOR 1970-2019; FORECASTS THROUGH 2050

  19. c

    Data from: Diabetes Prediction Dataset

    • cubig.ai
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Diabetes Prediction Dataset [Dataset]. https://cubig.ai/store/products/489/diabetes-prediction-dataset
    Explore at:
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Diabetes Prediction Dataset is a dataset built for the purpose of predicting diabetes and analyzing related risk factors. It contains various characteristics such as demographics, lifestyle, and clinical measurements, so it can be used to predict a patient's risk of developing diabetes.

    2) Data Utilization (1) Diabetes Prediction Dataset has characteristics that: • Key columns (characteristics) include a variety of clinical and lifestyle indicators related to diabetes, including age, gender, body mass index (BMI), blood pressure, blood sugar levels (Glucose), insulin, family history, and physical activity. (2) Diabetes Prediction Dataset can be used to: • Machine Learning/Deep Learning Model Development: It can be used to develop classification models (logistic regression, decision tree, random forest, neural network, etc.) that predict the risk of developing diabetes based on patient characteristics. • Data Analysis and Visualization: It is suitable for correlation analysis, risk factor derivation, Exploratory Data Analysis (EDA) and many other variables such as demographics, clinical figures, lifestyle, and more.

  20. A

    ‘Statistical Forecasting Demographic Projection Report - Enrollment...

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-statistical-forecasting-demographic-projection-report-enrollment-projections-new-york-city-public-schools-46f8/9554f28f/?iid=004-863&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York
    Description

    Analysis of ‘Statistical Forecasting Demographic Projection Report - Enrollment Projections - New York City Public Schools’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/cf070156-0e60-46d0-a31d-0b34d9b0dcd4 on 27 January 2022.

    --- Dataset description provided by original source is as follows ---

    Demographic projections performed by the Statistics Forecasting for elementary, middle and high school level students.

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Demografy (2021). Demografy's Consumer Demographics Prediction SaaS [Dataset]. https://datarade.ai/data-products/demografy-s-consumer-demographics-prediction-saas-demografy

Demografy's Consumer Demographics Prediction SaaS

Explore at:
.json, .csvAvailable download formats
Dataset updated
Jun 4, 2021
Dataset authored and provided by
Demografy
Area covered
Moldova (Republic of), Italy, Czech Republic, Croatia, Luxembourg, Monaco, Sweden, Finland, Poland, Denmark
Description

Demografy is a privacy by design customer demographics prediction AI platform.

Core features: - Demographic segmentation - Demographic analytics - API integration - Data export

Key advantages: - 100% coverage of lists - Accuracy estimate before purchase - GDPR-compliance as no sensitive data is required. Demografy can work with only first names or masked last names

Use cases: - Actionable analytics about your customers to get demographic insights - Appending missing demographic data to your records for customer segmentation and targeted marketing campaigns - Enhanced personalization knowing you customer better

Unlike traditional solutions, you don’t need to know and disclose your customer or prospect addresses, emails or other sensitive information. You can provide even masked last names keeping personal data in-house. This makes Demografy privacy by design and enables you to get 100% coverage of your audience since all you need to know is names.

Search
Clear search
Close search
Google apps
Main menu