100+ datasets found
  1. Data from: One-Step Estimator Paths for Concave Regularization

    • tandf.figshare.com
    • search.datacite.org
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matt Taddy (2023). One-Step Estimator Paths for Concave Regularization [Dataset]. http://doi.org/10.6084/m9.figshare.3485525.v5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Matt Taddy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques that can roughly be understood as providing estimation under penalty functions spanning the range of concavity between ℓ0 and ℓ1 norms. However, lasso ℓ1-regularized estimation remains the standard tool for industrial Big Data applications because of its minimal computational cost and the presence of easy-to-apply rules for penalty selection. In response, this article proposes a simple new algorithm framework that requires no more computation than a lasso path: the path of one-step estimators (POSE) does ℓ1 penalized regression estimation on a grid of decreasing penalties, but adapts coefficient-specific weights to decrease as a function of the coefficient estimated in the previous path step. This provides sparse diminishing-bias regularization at no extra cost over the fastest lasso algorithms. Moreover, our gamma lasso implementation of POSE is accompanied by a reliable heuristic for the fit degrees of freedom, so that standard information criteria can be applied in penalty selection. We also provide novel results on the distance between weighted-ℓ1 and ℓ0 penalized predictors; this allows us to build intuition about POSE and other diminishing-bias regularization schemes. The methods and results are illustrated in extensive simulations and in application of logistic regression to evaluating the performance of hockey players. Supplementary materials for this article are available online.

  2. f

    Data from: Statistical Mapping of PFOA and PFOS in Groundwater throughout...

    • acs.figshare.com
    xlsx
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bumjun Park; Hyunseung Kang; Christopher Zahasky (2024). Statistical Mapping of PFOA and PFOS in Groundwater throughout the Contiguous United States [Dataset]. http://doi.org/10.1021/acs.est.4c05616.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 24, 2024
    Dataset provided by
    ACS Publications
    Authors
    Bumjun Park; Hyunseung Kang; Christopher Zahasky
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    United States, Contiguous United States
    Description

    Per-and polyfluoroalkyl substances (PFAS) are synthetic chemicals that are increasingly being detected in groundwater. The negative health consequences associated with human exposure to PFAS make it essential to quantify the distribution of PFAS in groundwater systems. Mapping PFAS distributions is particularly challenging because a national patchwork of testing and reporting requirements has resulted in sparse and spatially biased data. In this analysis, an inhomogeneous Poisson process (IPP) modeling approach is adopted from ecological statistics to continuously map PFAS distributions in groundwater across the contiguous United States. The model is trained on a unique data set of 8910 PFAS groundwater measurements, using combined concentrations of two PFAS analytes. The IPP model predictions are compared with results from random forest models to highlight the robustness of this statistical modeling approach on sparse data sets. This analysis provides a new approach to not only map PFAS contamination in groundwater but also prioritize future sampling efforts.

  3. r

    Data from: Sparse Principal Component Analysis with Preserved Sparsity...

    • researchdata.edu.au
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics (2019). Sparse Principal Component Analysis with Preserved Sparsity Pattern [Dataset]. http://doi.org/10.24433/CO.4593141.V1
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    The University of Western Australia
    Code Ocean
    Authors
    Inge Koch; Navid Shokouhi; Abd-Krim Seghouane; Mathematics and Statistics
    Description

    MATLAB code + demo to reproduce results for "Sparse Principal Component Analysis with Preserved Sparsity". This code calculates the principal loading vectors for any given high-dimensional data matrix. The advantage of this method over existing sparse-PCA methods is that it can produce principal loading vectors with the same sparsity pattern for any number of principal components. Please see Readme.md for more information.

  4. simulated data used in supplement for 70% sparsity

    • figshare.com
    application/gzip
    Updated Dec 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shuang jiang (2019). simulated data used in supplement for 70% sparsity [Dataset]. http://doi.org/10.6084/m9.figshare.9876125.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    shuang jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    New simulated data used in the supplement for comparing model performance in a sparse setting

  5. d

    Data from: Adaptation to visual sparsity enhances responses to isolated...

    • search.dataone.org
    • datadryad.org
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tong Gou; Catherine Matulis; Damon Clark (2025). Adaptation to visual sparsity enhances responses to isolated stimuli [Dataset]. http://doi.org/10.5061/dryad.t1g1jwtbs
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Tong Gou; Catherine Matulis; Damon Clark
    Description

    Sensory systems adapt their response properties to the statistics of their inputs. For instance, visual systems adapt to low-order statistics like mean and variance to encode stimuli efficiently or to facilitate specific downstream computations. However, it remains unclear how other statistical features affect sensory adaptation. Here, we explore how Drosophila’s visual motion circuits adapt to stimulus sparsity, a measure of the signal’s intermittency not captured by low-order statistics alone. Early visual neurons in both ON and OFF pathways alter their responses dramatically with stimulus sparsity, responding positively to both light and dark sparse stimuli but linearly to dense stimuli. These changes extend to downstream ON and OFF direction-selective neurons, which are activated by sparse stimuli of both polarities, but respond with opposite signs to light and dark regions of dense stimuli. Thus, sparse stimuli activate both ON and OFF pathways, recruiting a larger fraction of the ..., This dataset contains all experimental data necessary to create figures in Tong et al. (2024), as well as scripts to analyze them. The scripts are written in Matlab 2021b, and uses some functions from Statistics and Machine Learning Toolbox., , # Data from: Adaptation to visual sparsity enhances responses to isolated stimuli

    https://doi.org/10.5061/dryad.t1g1jwtbs

    Description of the data and file structure

    This repository contains the data and analysis scripts for the paper "Adaptation to visual sparsity enhances responses to isolated stimuli."

    • sparsity_Dryad_upload.zip: Complete dataset archive containing all data and scripts. Unzip it, and all the main analysis and plotting can be reproduced using the provided scripts and data.

    Folder structure

    • scripts/: MATLAB scripts for data analysis and figure generation
    • data/: .mat data files required by the scripts
    • utilities/: helper functions used by the analysis scripts

    scripts/: Each script reproduces the indicated figure panels from the paper (and supplements) by loading the appropriate data from data/

    • fig1_naturalScene.m: Fig 1A‑B, S1A‑D
    • fig1_Mi1GC6f_flash.m: Fig 1G‑N, S2F‑L
    • `fig2_con...,
  6. Data from: Global Statistics Dataset

    • kaggle.com
    zip
    Updated Sep 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Williams (2020). Global Statistics Dataset [Dataset]. https://www.kaggle.com/nickwilliams/gapminder-dataset
    Explore at:
    zip(391779 bytes)Available download formats
    Dataset updated
    Sep 16, 2020
    Authors
    Nick Williams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    What can we do to better understand our world? Look at the data!

    Content

    This dataset provides statistics on each country around the world across a span of over two hundred years (although the data is more sparse the further back you go!), as well as predictions for the future. Although there are many statistics available at https://www.gapminder.org/data/, I've selected the more commonly used metrics to load here.

    Acknowledgements

    I would like to acknowledge Hans Rosling, Ola Rosling, Anna Rosling Rönnlund, and the Gapminder Organization for providing the data and the inspiration to work with it.

    Inspiration

    My inspiration to work with this dataset stems from reading Factfulness by Hans Rosling and the Gapminder team.

  7. w

    UN Data: Environment Statistics: Waste

    • data.wu.ac.at
    .dbf, .prj, .sbn +5
    Updated Apr 11, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Wide Human Geography Data Working Group (2014). UN Data: Environment Statistics: Waste [Dataset]. https://data.wu.ac.at/odso/data_gov/N2Y5NzM0Y2QtY2FkZi00NTk3LThiYmUtOGM3MDk0OTc4NDA3
    Explore at:
    xml, .dbf, zip, .sbx, .shx, .sbn, .shp, .prjAvailable download formats
    Dataset updated
    Apr 11, 2014
    Dataset provided by
    World Wide Human Geography Data Working Group
    Area covered
    United Nations
    Description

    The Environment Statistics Database contains selected water and waste statistics by country. Statistics on water and waste are based on official statistics supplied by national statistical offices and/or ministries of environment (or equivalent institutions) in countries in response to the biennial UNSD/UNEP Questionnaire on Environment Statistics. They were complemented by data on EU and OECD member and partner countries from OECD and Eurostat. Environment statistics is still in an early stage of development in many countries, and data are often sparse. The statistics selected here are those of relatively good quality and geographic coverage. The online database currently covers the years 1990, 1995 to 2009. For information on definitions, data quality and other important metadata, please check UNSD Environmental Indicator tables.Last update in UNdata: 19 Sep 2011Next update in UNdata: Jul 2013

  8. P

    Panama PA: Prevalence of Overweight: Weight for Height: % of Children Under...

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Panama PA: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/panama/social-health-statistics/pa-prevalence-of-overweight-weight-for-height--of-children-under-5-modeled-estimate
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    Panama
    Description

    Panama PA: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data was reported at 10.900 % in 2024. This records a decrease from the previous number of 11.100 % for 2023. Panama PA: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 10.900 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 11.500 % in 2019 and a record low of 8.300 % in 2000. Panama PA: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Panama – Table PA.World Bank.WDI: Social: Health Statistics. Prevalence of overweight children is the percentage of children under age 5 whose weight for height is more than two standard deviations above the median for the international reference population of the corresponding age as established by the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Once considered only a high-income economy problem, overweight children have become a growing concern in developing countries. Research shows an association between childhood obesity and a high prevalence of diabetes, respiratory disease, high blood pressure, and psychosocial and orthopedic disorders (de Onis and Blössner 2003). Childhood obesity is associated with a higher chance of obesity, premature death, and disability in adulthood. In addition to increased future risks, obese children experience breathing difficulties and increased risk of fractures, hypertension, early markers of cardiovascular disease, insulin resistance, and psychological effects. Children in low- and middle-income countries are more vulnerable to inadequate nutrition before birth and in infancy and early childhood. Many of these children are exposed to high-fat, high-sugar, high-salt, calorie-dense, micronutrient-poor foods, which tend be lower in cost than more nutritious foods. These dietary patterns, in conjunction with low levels of physical activity, result in sharp increases in childhood obesity, while under-nutrition continues. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

  9. J

    Jamaica JM: Prevalence of Stunting: Height for Age: % of Children Under 5,...

    • ceicdata.com
    Updated Jun 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2017). Jamaica JM: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/jamaica/social-health-statistics/jm-prevalence-of-stunting-height-for-age--of-children-under-5-modeled-estimate
    Explore at:
    Dataset updated
    Jun 15, 2017
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    Jamaica
    Description

    Jamaica JM: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data was reported at 6.900 % in 2024. This records an increase from the previous number of 6.800 % for 2023. Jamaica JM: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 6.400 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 7.200 % in 2000 and a record low of 6.100 % in 2012. Jamaica JM: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Jamaica – Table JM.World Bank.WDI: Social: Health Statistics. Prevalence of stunting is the percentage of children under age 5 whose height for age is more than two standard deviations below the median for the international reference population ages 0-59 months. For children up to two years old height is measured by recumbent length. For older children height is measured by stature while standing. The data are based on the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Undernourished children have lower resistance to infection and are more likely to die from common childhood ailments such as diarrheal diseases and respiratory infections. Frequent illness saps the nutritional status of those who survive, locking them into a vicious cycle of recurring sickness and faltering growth (UNICEF). Being even mildly underweight increases the risk of death and inhibits cognitive development in children. And it perpetuates the problem across generations, as malnourished women are more likely to have low-birth-weight babies. Stunting, or being below median height for age, is often used as a proxy for multifaceted deprivation and as an indicator of long-term changes in malnutrition. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

  10. NCHS - Drug Poisoning Mortality by County: United States

    • data.virginia.gov
    • healthdata.gov
    • +4more
    csv, json, rdf, xsl
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). NCHS - Drug Poisoning Mortality by County: United States [Dataset]. https://data.virginia.gov/dataset/nchs-drug-poisoning-mortality-by-county-united-states
    Explore at:
    json, rdf, xsl, csvAvailable download formats
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    This dataset contains model-based county estimates for drug-poisoning mortality.

    Deaths are classified using the International Classification of Diseases, Tenth Revision (ICD–10). Drug-poisoning deaths are defined as having ICD–10 underlying cause-of-death codes X40–X44 (unintentional), X60–X64 (suicide), X85 (homicide), or Y10–Y14 (undetermined intent).

    Estimates are based on the National Vital Statistics System multiple cause-of-death mortality files (1). Age-adjusted death rates (deaths per 100,000 U.S. standard population for 2000) are calculated using the direct method. Populations used for computing death rates for 2011–2016 are postcensal estimates based on the 2010 U.S. census. Rates for census years are based on populations enumerated in the corresponding censuses. Rates for noncensus years before 2010 are revised using updated intercensal population estimates and may differ from rates previously published.

    Death rates for some states and years may be low due to a high number of unresolved pending cases or misclassification of ICD–10 codes for unintentional poisoning as R99, “Other ill-defined and unspecified causes of mortality” (2). For example, this issue is known to affect New Jersey in 2009 and West Virginia in 2005 and 2009 but also may affect other years and other states. Drug poisoning death rates may be underestimated in those instances.

    Smoothed county age-adjusted death rates (deaths per 100,000 population) were obtained according to methods described elsewhere (3–5). Briefly, two-stage hierarchical models were used to generate empirical Bayes estimates of county age-adjusted death rates due to drug poisoning for each year. These annual county-level estimates “borrow strength” across counties to generate stable estimates of death rates where data are sparse due to small population size (3,5). Estimates for 1999-2015 have been updated, and may differ slightly from previously published estimates. Differences are expected to be minimal, and may result from different county boundaries used in this release (see below) and from the inclusion of an additional year of data. Previously published estimates can be found here for comparison.(6) Estimates are unavailable for Broomfield County, Colorado, and Denali County, Alaska, before 2003 (7,8). Additionally, Clifton Forge County, Virginia only appears on the mortality files prior to 2003, while Bedford City, Virginia was added to Bedford County in 2015 and no longer appears in the mortality file in 2015. These counties were therefore merged with adjacent counties where necessary to create a consistent set of geographic units across the time period. County boundaries are largely consistent with the vintage 2005-2007 bridged-race population file geographies, with the modifications noted previously (7,8).

    REFERENCES 1. National Center for Health Statistics. National Vital Statistics System: Mortality data. Available from: http://www.cdc.gov/nchs/deaths.htm.

    1. CDC. CDC Wonder: Underlying cause of death 1999–2016. Available from: http://wonder.cdc.gov/wonder/help/ucd.html.

    2. Rossen LM, Khan D, Warner M. Trends and geographic patterns in drug-poisoning death rates in the U.S., 1999–2009. Am J Prev Med 45(6):e19–25. 2013.

    3. Rossen LM, Khan D, Warner M. Hot spots in mortality from drug poisoning in the United States, 2007–2009. Health Place 26:14–20. 2014.

    4. Rossen LM, Khan D, Hamilton B, Warner M. Spatiotemporal variation in selected health outcomes from the National Vital Statistics System. Presented at: 2015 National Conference on Health Statistics, August 25, 2015, Bethesda, MD. Available from: http://www.cdc.gov/nchs/ppt/nchs2015/Rossen_Tuesday_WhiteOak_BB3.pdf.

    5. Rossen LM, Bastian B, Warner M, and Khan D. NCHS – Drug Poisoning Mortality by County: United States, 1999-2015. Available from: https://data.cdc.gov/NCHS/NCHS-Drug-Poisoning-Mortality-by-County-United-Sta/pbkm-d27e.

    6. National Center for Health Statistics. County geog

  11. f

    Data_Sheet_1_A Phylogeny-Regularized Sparse Regression Model for Predictive...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Xiao; Li Chen; Yue Yu; Xianyang Zhang; Jun Chen (2023). Data_Sheet_1_A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data.pdf [Dataset]. http://doi.org/10.3389/fmicb.2018.03112.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Jian Xiao; Li Chen; Yue Yu; Xianyang Zhang; Jun Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One important characteristic of the microbial community data is the phylogenetic tree that relates all the microbial taxa based on their evolutionary history. The phylogenetic tree is an informative prior for more efficient prediction since the microbial community changes are usually not randomly distributed on the tree but tend to occur in clades at varying phylogenetic depths (clustered signal). Although community-wide changes are possible for some conditions, it is also likely that the community changes are only associated with a small subset of “marker” taxa (sparse signal). Unfortunately, predictive models of microbial community data taking into account both the sparsity and the tree structure remain under-developed. In this paper, we propose a predictive framework to exploit sparse and clustered microbiome signals using a phylogeny-regularized sparse regression model. Our approach is motivated by evolutionary theory, where a natural correlation structure among microbial taxa exists according to the phylogenetic relationship. A novel phylogeny-based smoothness penalty is proposed to smooth the coefficients of the microbial taxa with respect to the phylogenetic tree. Using simulated and real datasets, we show that our method achieves better prediction performance than competing sparse regression methods for sparse and clustered microbiome signals.

  12. E

    Eritrea ER: Prevalence of Stunting: Height for Age: % of Children Under 5,...

    • ceicdata.com
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). Eritrea ER: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/eritrea/social-health-statistics/er-prevalence-of-stunting-height-for-age--of-children-under-5-modeled-estimate
    Explore at:
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    Eritrea
    Description

    Eritrea ER: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data was reported at 48.000 % in 2024. This records a decrease from the previous number of 48.100 % for 2023. Eritrea ER: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 49.000 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 51.700 % in 2013 and a record low of 43.400 % in 2000. Eritrea ER: Prevalence of Stunting: Height for Age: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Eritrea – Table ER.World Bank.WDI: Social: Health Statistics. Prevalence of stunting is the percentage of children under age 5 whose height for age is more than two standard deviations below the median for the international reference population ages 0-59 months. For children up to two years old height is measured by recumbent length. For older children height is measured by stature while standing. The data are based on the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Undernourished children have lower resistance to infection and are more likely to die from common childhood ailments such as diarrheal diseases and respiratory infections. Frequent illness saps the nutritional status of those who survive, locking them into a vicious cycle of recurring sickness and faltering growth (UNICEF). Being even mildly underweight increases the risk of death and inhibits cognitive development in children. And it perpetuates the problem across generations, as malnourished women are more likely to have low-birth-weight babies. Stunting, or being below median height for age, is often used as a proxy for multifaceted deprivation and as an indicator of long-term changes in malnutrition. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

  13. NCHS - Teen Birth Rates for Age Group 15-19 in the United States by County

    • s.cnmilf.com
    • healthdata.gov
    • +4more
    Updated Mar 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2022). NCHS - Teen Birth Rates for Age Group 15-19 in the United States by County [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/nchs-teen-birth-rates-for-age-group-15-19-in-the-united-states-by-county
    Explore at:
    Dataset updated
    Mar 16, 2022
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    This data set contains estimated teen birth rates for age group 15–19 (expressed per 1,000 females aged 15–19) by county and year. DEFINITIONS Estimated teen birth rate: Model-based estimates of teen birth rates for age group 15–19 (expressed per 1,000 females aged 15–19) for a specific county and year. Estimated county teen birth rates were obtained using the methods described elsewhere (1,2,3,4). These annual county-level teen birth estimates “borrow strength” across counties and years to generate accurate estimates where data are sparse due to small population size (1,2,3,4). The inferential method uses information—including the estimated teen birth rates from neighboring counties across years and the associated explanatory variables—to provide a stable estimate of the county teen birth rate. Median teen birth rate: The middle value of the estimated teen birth rates for the age group 15–19 for counties in a state. Bayesian credible intervals: A range of values within which there is a 95% probability that the actual teen birth rate will fall, based on the observed teen births data and the model. NOTES Data on the number of live births for women aged 15–19 years were extracted from the National Center for Health Statistics’ (NCHS) National Vital Statistics System birth data files for 2003–2015 (5). Population estimates were extracted from the files containing intercensal and postcensal bridged-race population estimates provided by NCHS. For each year, the July population estimates were used, with the exception of the year of the decennial census, 2010, for which the April estimates were used. Hierarchical Bayesian space–time models were used to generate hierarchical Bayesian estimates of county teen birth rates for each year during 2003–2015 (1,2,3,4). The Bayesian analogue of the frequentist confidence interval is defined as the Bayesian credible interval. A 100*(1-α)% Bayesian credible interval for an unknown parameter vector θ and observed data vector y is a subset C of parameter space Ф such that 1-α≤P({C│y})=∫p{θ │y}dθ, where integration is performed over the set and is replaced by summation for discrete components of θ. The probability that θ lies in C given the observed data y is at least (1- α) (6). County borders in Alaska changed, and new counties were formed and others were merged, during 2003–2015. These changes were reflected in the population files but not in the natality files. For this reason, two counties in Alaska were collapsed so that the birth and population counts were comparable. Additionally, Kalawao County, a remote island county in Hawaii, recorded no births, and census estimates indicated a denominator of 0 (i.e., no females between the ages of 15 and 19 years residing in the county from 2003 through 2015). For this reason, Kalawao County was removed from the analysis. Also , Bedford City, Virginia, was added to Bedford County in 2015 and no longer appears in the mortality file in 2015. For consistency, Bedford City was merged with Bedford County, Virginia, for the entire 2003–2015 period. Final analysis was conducted on 3,137 counties for each year from 2003 through 2015. County boundaries are consistent with the vintage 2005–2007 bridged-race population file geographies (7). SOURCES National Center for Health Statistics. Vital statistics data available online, Natality all-county files. Hyattsville, MD. Published annually. For details about file release and access policy, see NCHS data release and access policy for micro-data and compressed vital statistics files, available from: http://www.cdc.gov/nchs/nvss/dvs_data_release.htm. For natality public-use files, see vital statistics data available online, available from: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm. National Center for Health Statistics. U.S. Census populations with bridged race categories. Estimated population data available. Postcensal and intercensal files. Hyattsville, MD

  14. f

    Data from: Bayesian inference for high-dimensional nonstationary Gaussian...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark D. Risser; Daniel Turek (2023). Bayesian inference for high-dimensional nonstationary Gaussian processes [Dataset]. http://doi.org/10.6084/m9.figshare.12849569.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Mark D. Risser; Daniel Turek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In spite of the diverse literature on nonstationary spatial modelling and approximate Gaussian process (GP) methods, there are no general approaches for conducting fully Bayesian inference for moderately sized nonstationary spatial data sets on a personal laptop. For statisticians and data scientists who wish to conduct posterior inference and prediction with appropriate uncertainty quantification, the lack of such approaches and software is a limitation. Here, we develop methodology for implementing formal Bayesian inference for a general class of nonstationary GPs. Our novel approach uses pre-existing frameworks for characterizing nonstationarity in a new way while utilizing via modern GP likelihood approximations. Posterior sampling is implemented using flexible MCMC methods, with nonstationary posterior prediction conducted as a post-processing step. We demonstrate our novel methods on two data sets, ranging from several hundred to several thousand locations. All of our methods are implemented in the freely available BayesNSGP software package for R.

  15. Descriptive of our cohort. For categorical variables: counts of cases...

    • plos.figshare.com
    xls
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Carlos Espinosa-Moreno; Fernando García-García; Naia Mas-Bilbao; Susana García-Gutiérrez; María José Legarreta-Olabarrieta; Dae-Jin Lee (2025). Descriptive of our cohort. For categorical variables: counts of cases (percentage). For numerical: median ( – percentiles). [Dataset]. http://doi.org/10.1371/journal.pone.0322101.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Aug 26, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Juan Carlos Espinosa-Moreno; Fernando García-García; Naia Mas-Bilbao; Susana García-Gutiérrez; María José Legarreta-Olabarrieta; Dae-Jin Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive of our cohort. For categorical variables: counts of cases (percentage). For numerical: median ( – percentiles).

  16. G

    Georgia GE: Prevalence of Overweight: Weight for Height: % of Children Under...

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Georgia GE: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/georgia/social-health-statistics/ge-prevalence-of-overweight-weight-for-height--of-children-under-5-modeled-estimate
    Explore at:
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    Georgia
    Description

    Georgia GE: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data was reported at 4.100 % in 2024. This records a decrease from the previous number of 4.400 % for 2023. Georgia GE: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 13.700 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 20.400 % in 2004 and a record low of 4.100 % in 2024. Georgia GE: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Georgia – Table GE.World Bank.WDI: Social: Health Statistics. Prevalence of overweight children is the percentage of children under age 5 whose weight for height is more than two standard deviations above the median for the international reference population of the corresponding age as established by the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Once considered only a high-income economy problem, overweight children have become a growing concern in developing countries. Research shows an association between childhood obesity and a high prevalence of diabetes, respiratory disease, high blood pressure, and psychosocial and orthopedic disorders (de Onis and Blössner 2003). Childhood obesity is associated with a higher chance of obesity, premature death, and disability in adulthood. In addition to increased future risks, obese children experience breathing difficulties and increased risk of fractures, hypertension, early markers of cardiovascular disease, insulin resistance, and psychological effects. Children in low- and middle-income countries are more vulnerable to inadequate nutrition before birth and in infancy and early childhood. Many of these children are exposed to high-fat, high-sugar, high-salt, calorie-dense, micronutrient-poor foods, which tend be lower in cost than more nutritious foods. These dietary patterns, in conjunction with low levels of physical activity, result in sharp increases in childhood obesity, while under-nutrition continues. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

  17. D

    Dominican Republic DO: Prevalence of Overweight: Weight for Height: % of...

    • ceicdata.com
    Updated Sep 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2017). Dominican Republic DO: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/dominican-republic/social-health-statistics/do-prevalence-of-overweight-weight-for-height--of-children-under-5-modeled-estimate
    Explore at:
    Dataset updated
    Sep 15, 2017
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2011 - Dec 1, 2022
    Area covered
    Dominican Republic
    Description

    Dominican Republic DO: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data was reported at 7.500 % in 2024. This stayed constant from the previous number of 7.500 % for 2023. Dominican Republic DO: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 7.500 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 7.700 % in 2013 and a record low of 7.000 % in 2000. Dominican Republic DO: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Dominican Republic – Table DO.World Bank.WDI: Social: Health Statistics. Prevalence of overweight children is the percentage of children under age 5 whose weight for height is more than two standard deviations above the median for the international reference population of the corresponding age as established by the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Once considered only a high-income economy problem, overweight children have become a growing concern in developing countries. Research shows an association between childhood obesity and a high prevalence of diabetes, respiratory disease, high blood pressure, and psychosocial and orthopedic disorders (de Onis and Blössner 2003). Childhood obesity is associated with a higher chance of obesity, premature death, and disability in adulthood. In addition to increased future risks, obese children experience breathing difficulties and increased risk of fractures, hypertension, early markers of cardiovascular disease, insulin resistance, and psychological effects. Children in low- and middle-income countries are more vulnerable to inadequate nutrition before birth and in infancy and early childhood. Many of these children are exposed to high-fat, high-sugar, high-salt, calorie-dense, micronutrient-poor foods, which tend be lower in cost than more nutritious foods. These dietary patterns, in conjunction with low levels of physical activity, result in sharp increases in childhood obesity, while under-nutrition continues. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

  18. f

    Data from: Bayesian multiple logistic regression for case-control GWAS

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Dec 31, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schunkert, Heribert; Zeng, Lingyao; Banerjee, Saikat; Söding, Johannes (2018). Bayesian multiple logistic regression for case-control GWAS [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000618114
    Explore at:
    Dataset updated
    Dec 31, 2018
    Authors
    Schunkert, Heribert; Zeng, Lingyao; Banerjee, Saikat; Söding, Johannes
    Description

    Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.

  19. Rural Urban Classification of Cambridgeshire - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Feb 18, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2019). Rural Urban Classification of Cambridgeshire - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/rural-urban-classification-of-cambridgeshire1
    Explore at:
    Dataset updated
    Feb 18, 2019
    Dataset provided by
    CKANhttps://ckan.org/
    Area covered
    Cambridgeshire
    Description

    The 2011 rural-urban classification was released in August 2013. It is a revised version of the classification produced after the 2001 Census, but with additional detail in the urban domain. The product was sponsored by a cross-Government working group comprising Department for Environment, Food and Rural Affairs, Department of the Communities and Local Government, Office for National Statistics and the Welsh Government. The data is available at three geographical levels: Output Area Output areas are treated as ‘urban’ if they were allocated to a 2011 built-up area with a population of 10,000 or more. The urban domain is then further sub-divided into three broad morphological types based on the predominant settlement component. As with the previous version of the classification, the remaining ‘rural’ output areas are grouped into three broad morphological types based on the predominant settlement component. The classification also categorises output areas based on context – i.e. whether the wider surrounding area of a given output area is sparsely populated or less sparsely populated. Urban: Major Conurbation (A1) Urban: Minor Conurbation (B1) Urban: City and Town (C1) Urban: City and Town in a Sparse Setting (C2) Rural: Town and Fringe (D1) Rural: Town and Fringe in a Sparse Setting (D2) Rural: Village (E1) Rural: Village in a Sparse Setting (E2) Rural: Hamlets and Isolated Dwellings (F1) Rural: Hamlets and Isolated Dwellings in a Sparse Setting (F2) Lower Layer Super Output Areas (LSOA) The 2011 rural-urban classification of lower layer super output areas was released in August 2013. It is a revised version of the classification produced after the 2001 Census, but with additional detail in the urban domain. This product was sponsored by a cross-Government working group comprising Department for Environment, Food and Rural Affairs, Department of the Communities and Local Government, Office for National Statistics and the Welsh Government. The classification at LSOA level is built from the RUC at OA level (the most detailed version of the classification). Assignments of LSOA to urban or rural categories are made by reference to the category to which the majority of their constituent OA are assigned. In the RUC at OA level, output areas are treated as ‘urban’ if they were allocated to a 2011 built-up area with a population of 10,000 or more. The urban domain is then further sub-divided into three broad morphological types based on the predominant settlement component. As with the previous version of the classification, the remaining ‘rural’ output areas are grouped into three broad morphological types based on the predominant settlement component. At the LSOA scale settlement form is less homogenous than at OA level and so there are just two rural settlement types. The classification also categorises output areas based on context – i.e. whether the wider surrounding area of a given output area is sparsely populated or less sparsely populated. Urban: Major Conurbation (A1) Urban: Minor Conurbation (B1) Urban: City and Town (C1) Urban: City and Town in a Sparse Setting (C2) Rural Town and Fringe (D1) Rural Town and Fringe in a Sparse Setting (D2) Rural Village and Dispersed (E1) Rural Village and Dispersed in a Sparse Setting (E2) Middle Layer Super Output Areas (MSOA) The 2011 rural-urban classification of middle layer super output areas was released in August 2013. It is a revised version of the classification produced after the 2001 Census, but with additional detail in the urban domain. This product was sponsored by a cross-Government working group comprising Department for Environment, Food and Rural Affairs, Department of the Communities and Local Government, Office for National Statistics and the Welsh Government. The classification at MSOA level is built from the RUC at OA level (the most detailed version of the classification). Assignments of MSOA to urban or rural categories are made by reference to the category to which the majority of their constituent OA are assigned. In the RUC at OA level, output areas are treated as ‘urban’ if they were allocated to a 2011 built-up area with a population of 10,000 or more. The urban domain is then further sub-divided into three broad morphological types based on the predominant settlement component. As with the previous version of the classification, the remaining ‘rural’ output areas are grouped into three broad morphological types based on the predominant settlement component. At the MSOA scale settlement form is less homogenous than at OA level and so there are just two rural settlement types. The classification also categorises output areas based on context – i.e. whether the wider surrounding area of a given output area is sparsely populated or less sparsely populated. Urban: Major Conurbation (A1) Urban: Minor Conurbation (B1) Urban: City and Town (C1) Urban: City and Town in a Sparse Setting (C2) Rural Town and Fringe (D1) Rural Town and Fringe in a Sparse Setting (D2) Rural Village and Dispersed (E1) Rural Village and Dispersed in a Sparse Setting (E2)

  20. E

    Ecuador EC: Prevalence of Overweight: Weight for Height: % of Children Under...

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Ecuador EC: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate [Dataset]. https://www.ceicdata.com/en/ecuador/social-health-statistics/ec-prevalence-of-overweight-weight-for-height--of-children-under-5-modeled-estimate
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2009 - Dec 1, 2020
    Area covered
    Ecuador
    Description

    Ecuador EC: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data was reported at 4.700 % in 2024. This records a decrease from the previous number of 5.200 % for 2023. Ecuador EC: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data is updated yearly, averaging 6.000 % from Dec 2000 (Median) to 2024, with 25 observations. The data reached an all-time high of 7.200 % in 2016 and a record low of 3.800 % in 2000. Ecuador EC: Prevalence of Overweight: Weight for Height: % of Children Under 5, Modeled Estimate data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Ecuador – Table EC.World Bank.WDI: Social: Health Statistics. Prevalence of overweight children is the percentage of children under age 5 whose weight for height is more than two standard deviations above the median for the international reference population of the corresponding age as established by the WHO's 2006 Child Growth Standards.;UNICEF, WHO, World Bank: Joint child Malnutrition Estimates (JME).;Weighted average;Once considered only a high-income economy problem, overweight children have become a growing concern in developing countries. Research shows an association between childhood obesity and a high prevalence of diabetes, respiratory disease, high blood pressure, and psychosocial and orthopedic disorders (de Onis and Blössner 2003). Childhood obesity is associated with a higher chance of obesity, premature death, and disability in adulthood. In addition to increased future risks, obese children experience breathing difficulties and increased risk of fractures, hypertension, early markers of cardiovascular disease, insulin resistance, and psychological effects. Children in low- and middle-income countries are more vulnerable to inadequate nutrition before birth and in infancy and early childhood. Many of these children are exposed to high-fat, high-sugar, high-salt, calorie-dense, micronutrient-poor foods, which tend be lower in cost than more nutritious foods. These dietary patterns, in conjunction with low levels of physical activity, result in sharp increases in childhood obesity, while under-nutrition continues. Estimates are modeled estimates produced by the JME. Primary data sources of the anthropometric measurements are national surveys. These surveys are administered sporadically, resulting in sparse data for many countries. Furthermore, the trend of the indicators over time is usually not a straight line and varies by country. Tracking the current level and progress of indicators helps determine if countries are on track to meet certain thresholds, such as those indicated in the SDGs. Thus the JME developed statistical models and produced the modeled estimates.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Matt Taddy (2023). One-Step Estimator Paths for Concave Regularization [Dataset]. http://doi.org/10.6084/m9.figshare.3485525.v5
Organization logo

Data from: One-Step Estimator Paths for Concave Regularization

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Matt Taddy
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The statistics literature of the past 15 years has established many favorable properties for sparse diminishing-bias regularization: techniques that can roughly be understood as providing estimation under penalty functions spanning the range of concavity between ℓ0 and ℓ1 norms. However, lasso ℓ1-regularized estimation remains the standard tool for industrial Big Data applications because of its minimal computational cost and the presence of easy-to-apply rules for penalty selection. In response, this article proposes a simple new algorithm framework that requires no more computation than a lasso path: the path of one-step estimators (POSE) does ℓ1 penalized regression estimation on a grid of decreasing penalties, but adapts coefficient-specific weights to decrease as a function of the coefficient estimated in the previous path step. This provides sparse diminishing-bias regularization at no extra cost over the fastest lasso algorithms. Moreover, our gamma lasso implementation of POSE is accompanied by a reliable heuristic for the fit degrees of freedom, so that standard information criteria can be applied in penalty selection. We also provide novel results on the distance between weighted-ℓ1 and ℓ0 penalized predictors; this allows us to build intuition about POSE and other diminishing-bias regularization schemes. The methods and results are illustrated in extensive simulations and in application of logistic regression to evaluating the performance of hockey players. Supplementary materials for this article are available online.

Search
Clear search
Close search
Google apps
Main menu