100+ datasets found
  1. f

    Data for individual samples.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Jul 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    McGinty, Ryan C.; Fukagawa, Naomi K.; Couture, Garret; Phillips, Katherine M.; Pehrsson, Pamela R.; McKillop, Kyle (2021). Data for individual samples. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000868830
    Explore at:
    Dataset updated
    Jul 8, 2021
    Authors
    McGinty, Ryan C.; Fukagawa, Naomi K.; Couture, Garret; Phillips, Katherine M.; Pehrsson, Pamela R.; McKillop, Kyle
    Description

    Results for analyzed components in individual banana samples. (XLSX)

  2. d

    Sample 2026 Iowa Individual Affordable Care Act Premiums

    • catalog.data.gov
    • data.iowa.gov
    • +1more
    Updated Oct 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.iowa.gov (2025). Sample 2026 Iowa Individual Affordable Care Act Premiums [Dataset]. https://catalog.data.gov/dataset/sample-2025-iowa-individual-affordable-care-act-premiums
    Explore at:
    Dataset updated
    Oct 11, 2025
    Dataset provided by
    data.iowa.gov
    Area covered
    Iowa
    Description

    This dataset provides sample premium information for individual ACA-compliant health insurance plans available to Iowans for 2026 based on age, rating area and metal level. These are premiums for individuals, not families. Explore and drill into the data using the 2026 Sample Premium Explorer. Please note that not every plan ID is available in every county. On or after November 1, 2025, please go to www.healthcare.gov to determine if your plan is available in the county you reside in.

  3. n

    Data from: An approach to estimate short-term, long-term, and reaction norm...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jun 24, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yimen G. Araya-Ajoy; Kimberley J. Mathot; Niels J. Dingemanse (2016). An approach to estimate short-term, long-term, and reaction norm repeatability [Dataset]. http://doi.org/10.5061/dryad.37c1m
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 24, 2016
    Dataset provided by
    Max Planck Institute for Ornithology
    Authors
    Yimen G. Araya-Ajoy; Kimberley J. Mathot; Niels J. Dingemanse
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Evolutionary ecologists increasingly study reaction norms that are expressed repeatedly within the same individual's lifetime. For example, foragers continuously alter anti-predator vigilance in response to moment-to-moment changes in predation risk. Variation in this form of plasticity occurs both among and within individuals. Among-individual variation in plasticity (individual by environment interaction or I×E) is commonly studied; by contrast, despite increasing interest in its evolution and ecology, within-individual variation in phenotypic plasticity is not. We outline a study design based on repeated measures and a multi-level extension of random regression models that enables quantification of variation in reaction norms at different hierarchical levels (such as among- and within-individuals). The approach enables the calculation of repeatability of reaction norm intercepts (average phenotype) and slopes (level of phenotypic plasticity); these indices are not specific to measurement or scaling and are readily comparable across data sets. The proposed study design also enables calculation of repeatability at different temporal scales (such as short- and long-term repeatability) thereby answering calls for the development of approaches enabling scale-dependent repeatability calculations. We introduce a simulation package in the R statistical language to assess power, imprecision and bias for multi-level random regression that may be utilised for realistic datasets (unequal sample sizes across individuals, missing data, etc). We apply the idea to a worked example to illustrate its utility. We conclude that consideration of multi-level variation in reaction norms deepens our understanding of the hierarchical structuring of labile characters and helps reveal the biology in heterogeneous patterns of within-individual variance that would otherwise remain ‘unexplained’ residual variance.

  4. Example calculation of distributional consistency (DC) using the parameters...

    • figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel P. Heath; William A. Montevecchi; Daniel Esler (2023). Example calculation of distributional consistency (DC) using the parameters from the example data in Table 1. [Dataset]. http://doi.org/10.1371/journal.pone.0044353.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joel P. Heath; William A. Montevecchi; Daniel Esler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example calculation of distributional consistency (DC) using the parameters from the example data in Table 1.

  5. Meta data and supporting documentation

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  6. w

    Family Life Survey 1997, IFLS2 - Indonesia

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Sep 26, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAND Corporation (2013). Family Life Survey 1997, IFLS2 - Indonesia [Dataset]. https://microdata.worldbank.org/index.php/catalog/1046
    Explore at:
    Dataset updated
    Sep 26, 2013
    Dataset provided by
    University of California, Los Angeles
    RAND Corporation
    Time period covered
    1997 - 1998
    Area covered
    Indonesia
    Description

    Abstract

    By the middle of the 1990s, Indonesia had enjoyed over three decades of remarkable social, economic, and demographic change and was on the cusp of joining the middle-income countries. Per capita income had risen more than fifteenfold since the early 1960s, from around US$50 to more than US$800. Increases in educational attainment and decreases in fertility and infant mortality over the same period reflected impressive investments in infrastructure.

    In the late 1990s the economic outlook began to change as Indonesia was gripped by the economic crisis that affected much of Asia. In 1998 the rupiah collapsed, the economy went into a tailspin, and gross domestic product contracted by an estimated 12-15%-a decline rivaling the magnitude of the Great Depression.

    The general trend of several decades of economic progress followed by a few years of economic downturn masks considerable variation across the archipelago in the degree both of economic development and of economic setbacks related to the crisis. In part this heterogeneity reflects the great cultural and ethnic diversity of Indonesia, which in turn makes it a rich laboratory for research on a number of individual- and household-level behaviors and outcomes that interest social scientists.

    The Indonesia Family Life Survey is designed to provide data for studying behaviors and outcomes. The survey contains a wealth of information collected at the individual and household levels, including multiple indicators of economic and non-economic well-being: consumption, income, assets, education, migration, labor market outcomes, marriage, fertility, contraceptive use, health status, use of health care and health insurance, relationships among co-resident and non- resident family members, processes underlying household decision-making, transfers among family members and participation in community activities. In addition to individual- and household-level information, the IFLS provides detailed information from the communities in which IFLS households are located and from the facilities that serve residents of those communities. These data cover aspects of the physical and social environment, infrastructure, employment opportunities, food prices, access to health and educational facilities, and the quality and prices of services available at those facilities. By linking data from IFLS households to data from their communities, users can address many important questions regarding the impact of policies on the lives of the respondents, as well as document the effects of social, economic, and environmental change on the population.

    The Indonesia Family Life Survey complements and extends the existing survey data available for Indonesia, and for developing countries in general, in a number of ways.

    First, relatively few large-scale longitudinal surveys are available for developing countries. IFLS is the only large-scale longitudinal survey available for Indonesia. Because data are available for the same individuals from multiple points in time, IFLS affords an opportunity to understand the dynamics of behavior, at the individual, household and family and community levels. In IFLS1 7,224 households were interviewed, and detailed individual-level data were collected from over 22,000 individuals. In IFLS2, 94.4% of IFLS1 households were re-contacted (interviewed or died). In IFLS3 the re-contact rate was 95.3% of IFLS1 households. Indeed nearly 91% of IFLS1 households are complete panel households in that they were interviewed in all three waves, IFLS1, 2 and 3. These re-contact rates are as high as or higher than most longitudinal surveys in the United States and Europe. High re-interview rates were obtained in part because we were committed to tracking and interviewing individuals who had moved or split off from the origin IFLS1 households. High re-interview rates contribute significantly to data quality in a longitudinal survey because they lessen the risk of bias due to nonrandom attrition in studies using the data.

    Second, the multipurpose nature of IFLS instruments means that the data support analyses of interrelated issues not possible with single-purpose surveys. For example, the availability of data on household consumption together with detailed individual data on labor market outcomes, health outcomes and on health program availability and quality at the community level means that one can examine the impact of income on health outcomes, but also whether health in turn affects incomes.

    Third, IFLS collected both current and retrospective information on most topics. With data from multiple points of time on current status and an extensive array of retrospective information about the lives of respondents, analysts can relate dynamics to events that occurred in the past. For example, changes in labor outcomes in recent years can be explored as a function of earlier decisions about schooling and work.

    Fourth, IFLS collected extensive measures of health status, including self-reported measures of general health status, morbidity experience, and physical assessments conducted by a nurse (height, weight, head circumference, blood pressure, pulse, waist and hip circumference, hemoglobin level, lung capacity, and time required to repeatedly rise from a sitting position). These data provide a much richer picture of health status than is typically available in household surveys. For example, the data can be used to explore relationships between socioeconomic status and an array of health outcomes.

    Fifth, in all waves of the survey, detailed data were collected about respondents¹ communities and public and private facilities available for their health care and schooling. The facility data can be combined with household and individual data to examine the relationship between, for example, access to health services (or changes in access) and various aspects of health care use and health status.

    Sixth, because the waves of IFLS span the period from several years before the economic crisis hit Indonesia, to just prior to it hitting, to one year and then three years after, extensive research can be carried out regarding the living conditions of Indonesian households during this very tumultuous period. In sum, the breadth and depth of the longitudinal information on individuals, households, communities, and facilities make IFLS data a unique resource for scholars and policymakers interested in the processes of economic development.

    Geographic coverage

    National coverage

    Analysis unit

    • Communities
    • Facilities
    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Because it is a longitudinal survey, the IFLS2 drew its sample from IFLS1. The IFLS1 sampling scheme stratified on provinces and urban/rural location, then randomly sampled within these strata. Provinces were selected to maximize representation of the population, capture the cultural and socioeconomic diversity of Indonesia, and be cost-effective to survey given the size and terrain of the country. For mainly cost-effectiveness reasons, 14 provinces were excluded. The resulting sample included 13 of Indonesia's 27 provinces containing 83% of the population: four provinces on Sumatra (North Sumatra, West Sumatra, South Sumatra, and Lampung), all five of the Javanese provinces (DKI Jakarta, West Java, Central Java, DI Yogyakarta, and East Java), and four provinces covering the remaining major island groups (Bali, West Nusa Tenggara, South Kalimantan, and South Sulawesi). Within each of the 13 provinces, enumeration areas (EAs) were randomly chosen from a nationally representative sample frame used in the 1993 SUSENAS, a socioeconomic survey of about 60,000 households. The IFLS randomly selected 321 enumeration areas in the 13 provinces, oversampling urban EAs and EAs in smaller provinces to facilitate urban-rural and Javanese-non-Javanese comparisons.

    Household Survey Within a selected EA, households were randomly selected based upon 1993 SUSENAS listings obtained from regional BPS office. A household was defined as a group of people whose members reside in the same dwelling and share food from the same cooking pot (the standard BPS definition). Twenty households were selected from each urban EA, and 30 households were selected from each rural EA. This strategy minimized expensive travel between rural EAs while balancing the costs of correlations among households. For IFLS1 a total of 7,730 households were sampled to obtain a final sample size goal of 7,000 completed households. This strategy was based on BPS experience of about 90% completion rates. In fact, IFLS1 exceeded that target and interviews were conducted with 7,224 households in late 1993 and early 1994.

    In IFLS1 it was determined to be too costly to interview all household members, so a sampling scheme was used to randomly select several members within a household to provide detailed individual information. IFLS1 conducted detailed interviews with the following household members: • the household head and his/her spouse • two randomly selected children of the head and spouse age 0 to 14 • an individual age 50 or older and his/her spouse, randomly selected from remaining members • for a randomly selected 25% of the households, an individual age 15 to 49 and his/her spouse, randomly selected from remaining members.

    IFLS2 Recontact Protocols In IFLS2 our goal was to relocate and reinterview the 7,224 households interviewed in 1993. If no members of the household were found in the 1993 interview location, we asked local residents (including an informant identified by the household in 1993) where the household had gone. If the household was thought to be within any of the 13 IFLS provinces, the household was tracked to the new location and if

  7. R code dataset derivation centralized.

    • plos.figshare.com
    txt
    Updated Nov 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau (2024). R code dataset derivation centralized. [Dataset]. http://doi.org/10.1371/journal.pone.0312697.s011
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Romain Jégou; Camille Bachot; Charles Monteil; Eric Boernert; Jacek Chmiel; Mathieu Boucher; David Pau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MethodsThe objective of this project was to determine the capability of a federated analysis approach using DataSHIELD to maintain the level of results of a classical centralized analysis in a real-world setting. This research was carried out on an anonymous synthetic longitudinal real-world oncology cohort randomly splitted in three local databases, mimicking three healthcare organizations, stored in a federated data platform integrating DataSHIELD. No individual data transfer, statistics were calculated simultaneously but in parallel within each healthcare organization and only summary statistics (aggregates) were provided back to the federated data analyst.Descriptive statistics, survival analysis, regression models and correlation were first performed on the centralized approach and then reproduced on the federated approach. The results were then compared between the two approaches.ResultsThe cohort was splitted in three samples (N1 = 157 patients, N2 = 94 and N3 = 64), 11 derived variables and four types of analyses were generated. All analyses were successfully reproduced using DataSHIELD, except for one descriptive variable due to data disclosure limitation in the federated environment, showing the good capability of DataSHIELD. For descriptive statistics, exactly equivalent results were found for the federated and centralized approaches, except some differences for position measures. Estimates of univariate regression models were similar, with a loss of accuracy observed for multivariate models due to source database variability.ConclusionOur project showed a practical implementation and use case of a real-world federated approach using DataSHIELD. The capability and accuracy of common data manipulation and analysis were satisfying, and the flexibility of the tool enabled the production of a variety of analyses while preserving the privacy of individual data. The DataSHIELD forum was also a practical source of information and support. In order to find the right balance between privacy and accuracy of the analysis, set-up of privacy requirements should be established prior to the start of the analysis, as well as a data quality review of the participating healthcare organization.

  8. S

    Data Analyses and Sample Size Planning for Intensive Longitudinal...

    • scidb.cn
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liuyue; Liu Hongyun (2025). Data Analyses and Sample Size Planning for Intensive Longitudinal Intervention Studies with Dynamic Structural Equation Modeling [Dataset]. http://doi.org/10.57760/sciencedb.psych.00506
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Liuyue; Liu Hongyun
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Intensive longitudinal interventions (ILIs) have gained prominence as powerful tools for treating and preventing mental and behavioral disorders (Heron & Smyth, 2010). However, most studies analyze ILI data use traditional methods like ANOVA or linear mixed models, which overlook individual differences and the autocorrelation structure inherent in time series data (Hedeker et al., 2008). Moreover, existing methods typically assess intervention effects based solely on changes in the mean level of key variables (e.g., anxiety). This study demonstrates how to model ILI data within the framework of dynamic structural equation modeling (DSEM) to evaluate intervention effects across three dimensions: mean, autoregression, and individual intra-variation (IIV), for two intervention designs: non-randomized single-arm trial (NST) and randomized control trial (RCT). We conducted two simulation studies to investigate sample size recommendations for DSEM in ILI studies, considering both statistical power and accuracy in parameter estimation (AIPE). Additionally, we compared the two designs based on type I error rate in a separate simulation. Finally, we illustrated sample size planning using data from a pre-ILI study focused on reducing appearance anxiety.Simulation Studies 1 and 2 investigated the power and AIPE across varying sample sizes, as well as the required sample size for both NST and RCT designs. The effect sizes of intervention effects for mean, autoregression and IIV were fixed at the medium level. Two factors regarding sample size were manipulated: number of participants (N = 30, 60, 100,150, 200, 300,400), number of time-points (T= 10, 20, 40, 60, 80, 100). The data-generating models and fitted models were identical, with analysis conducted using Mplus 8.10 and Bayesian estimation. Model performance was assessed in terms of convergence rate, power and AIPE for intervention effects, as well as bias in the standard errors of the intervention effects. Simulation Study 3 assessed the type I error rate for both designs when changes in the control group was different from zero, indicating a change (on average) due to time. Last, the empirical study conducted sample size planning based on a pre-study aimed at reducing appearance anxiety using an ILI design.The results are as following. First, there were no convergence issues under all the conditions. Second, power increased, width of the credible intervals decreased as either N or T increased. However, a minimum of 60 participants was required to achieve adequate power (i.e., ). The relative bias in intervention effect was generally small. Except in the NST design, the intervention effects on autoregression and IIV were underestimated when the number of time-points was low (i.e., T=10 or 20), while in the RCT design, the intervention effect on mean was underestimated when sample size in both levels were small (i.e., N=30 or 60, T=10). Bias in the standard error was also minimal across conditions. Third, a credible interval width contours plot could be applied to recommend sample sizes in DSEM. The sample size requirements based on power and AIPE were different under NST design and RCT design, with RCT requiring larger samples due to the addition of a control group. Fourth, when a natural change (on average) occurred between pre- and post-intervention phrases, the NST design led to inflated type I error rates compared to the RCT design, particularly with larger sample sizes.In conclusion, we first recommend using DSEM to analyze ILI data, as it better captures intervention effects on mean, autoregression, and IIV. Second, practitioners should select either the NST or RCT design based on theoretical and empirical considerations. While the RCT design controls for confounding factors like time-related changes in mean, it requires a larger sample size. NST designs were usually conducted before large RCTs with relatively small samples, especially for rare participants. Finally, choosing the true parameters for the data-generating model was crucial in sample size planning using a monte carlo method. We suggested derive these parameters from pre-studies, similar empirical studies or meta-analysis when possible, as many parameters (i.e., regarding to fixed effects and random effects) should be set in DSEM. If no prior information is available, we suggest following the procedures outlined in this study.This database includes the code for data generating and analysis in simulation studies, and data, code and results in empirical example.

  9. D

    Replication Data for: A Three-Year Mixed Methods Study of Undergraduates’...

    • dataverse.no
    • dataverse.azure.uit.no
    • +2more
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ellen Nierenberg; Ellen Nierenberg (2024). Replication Data for: A Three-Year Mixed Methods Study of Undergraduates’ Information Literacy Development: Knowing, Doing, and Feeling [Dataset]. http://doi.org/10.18710/SK0R1N
    Explore at:
    txt(21865), txt(19475), csv(55030), txt(14751), txt(26578), txt(16861), txt(28211), pdf(107685), pdf(657212), txt(12082), txt(16243), text/x-fixed-field(55030), pdf(65240), txt(8172), pdf(634629), txt(31896), application/x-spss-sav(51476), txt(4141), pdf(91121), application/x-spss-sav(31612), txt(35011), txt(23981), text/x-fixed-field(15653), txt(25369), txt(17935), csv(15653)Available download formats
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    DataverseNO
    Authors
    Ellen Nierenberg; Ellen Nierenberg
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Aug 8, 2019 - Jun 10, 2022
    Area covered
    Norway
    Description

    This data set contains the replication data and supplements for the article "Knowing, Doing, and Feeling: A three-year, mixed-methods study of undergraduates’ information literacy development." The survey data is from two samples: - cross-sectional sample (different students at the same point in time) - longitudinal sample (the same students and different points in time)Surveys were distributed via Qualtrics during the students' first and sixth semesters. Quantitative and qualitative data were collected and used to describe students' IL development over 3 years. Statistics from the quantitative data were analyzed in SPSS. The qualitative data was coded and analyzed thematically in NVivo. The qualitative, textual data is from semi-structured interviews with sixth-semester students in psychology at UiT, both focus groups and individual interviews. All data were collected as part of the contact author's PhD research on information literacy (IL) at UiT. The following files are included in this data set: 1. A README file which explains the quantitative data files. (2 file formats: .txt, .pdf)2. The consent form for participants (in Norwegian). (2 file formats: .txt, .pdf)3. Six data files with survey results from UiT psychology undergraduate students for the cross-sectional (n=209) and longitudinal (n=56) samples, in 3 formats (.dat, .csv, .sav). The data was collected in Qualtrics from fall 2019 to fall 2022. 4. Interview guide for 3 focus group interviews. File format: .txt5. Interview guides for 7 individual interviews - first round (n=4) and second round (n=3). File format: .txt 6. The 21-item IL test (Tromsø Information Literacy Test = TILT), in English and Norwegian. TILT is used for assessing students' knowledge of three aspects of IL: evaluating sources, using sources, and seeking information. The test is multiple choice, with four alternative answers for each item. This test is a "KNOW-measure," intended to measure what students know about information literacy. (2 file formats: .txt, .pdf)7. Survey questions related to interest - specifically students' interest in being or becoming information literate - in 3 parts (all in English and Norwegian): a) information and questions about the 4 phases of interest; b) interest questionnaire with 26 items in 7 subscales (Tromsø Interest Questionnaire - TRIQ); c) Survey questions about IL and interest, need, and intent. (2 file formats: .txt, .pdf)8. Information about the assignment-based measures used to measure what students do in practice when evaluating and using sources. Students were evaluated with these measures in their first and sixth semesters. (2 file formats: .txt, .pdf)9. The Norwegain Centre for Research Data's (NSD) 2019 assessment of the notification form for personal data for the PhD research project. In Norwegian. (Format: .pdf)

  10. Historic US Census - 1910

    • redivis.com
    • stanford.redivis.com
    application/jsonl +7
    Updated Jan 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Historic US Census - 1910 [Dataset]. http://doi.org/10.57761/n3ks-0444
    Explore at:
    parquet, application/jsonl, stata, csv, avro, sas, arrow, spssAvailable download formats
    Dataset updated
    Jan 10, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 1, 1910 - Dec 31, 1910
    Description

    Abstract

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Before Manuscript Submission

    All manuscripts (and other items you'd like to publish) must be submitted to

    phsdatacore@stanford.edu for approval prior to journal submission.

    We will check your cell sizes and citations.

    For more information about how to cite PHS and PHS datasets, please visit:

    https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

    Documentation

    Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The historic US 1910 census data was collected in April 1910. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

    Section 2

    This dataset was created on 2020-01-10 23:47:27.924 by merging multiple datasets together. The source datasets for this version were:

    IPUMS 1910 households: The Integrated Public Use Microdata Series (IPUMS) Complete Count Data are historic individual and household census records and are a unique source for research on social and economic change.

    IPUMS 1910 persons: This dataset includes all individuals from the 1910 US census.

  11. i

    Household Health Survey 2012-2013, Economic Research Forum (ERF)...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Jun 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Organization (CSO) (2017). Household Health Survey 2012-2013, Economic Research Forum (ERF) Harmonization Data - Iraq [Dataset]. https://catalog.ihsn.org/index.php/catalog/6937
    Explore at:
    Dataset updated
    Jun 26, 2017
    Dataset provided by
    Economic Research Forum
    Central Statistical Organization (CSO)
    Kurdistan Regional Statistics Office (KRSO)
    Time period covered
    2012 - 2013
    Area covered
    Iraq
    Description

    Abstract

    The harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.

    ----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:

    Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    The survey has six main objectives. These objectives are:

    1. Provide data for poverty analysis and measurement and monitor, evaluate and update the implementation Poverty Reduction National Strategy issued in 2009.
    2. Provide comprehensive data system to assess household social and economic conditions and prepare the indicators related to the human development.
    3. Provide data that meet the needs and requirements of national accounts.
    4. Provide detailed indicators on consumption expenditure that serve making decision related to production, consumption, export and import.
    5. Provide detailed indicators on the sources of households and individuals income.
    6. Provide data necessary for formulation of a new consumer price index number.

    The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.

    Geographic coverage

    National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.

    Analysis unit

    1- Household/family. 2- Individual/person.

    Universe

    The survey was carried out over a full year covering all governorates including those in Kurdistan Region.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    ----> Design:

    Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.

    ----> Sample frame:

    Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.

    ----> Sampling Stages:

    In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    ----> Preparation:

    The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.

    ----> Questionnaire Parts:

    The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job

    Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.

    Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days

    Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.

    Cleaning operations

    ----> Raw Data:

    Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.

    ----> Harmonized Data:

    • The SPSS package is used to harmonize the Iraq Household Socio Economic Survey (IHSES) 2007 with Iraq Household Socio Economic Survey (IHSES) 2012.
    • The harmonization process starts with raw data files received from the Statistical Office.
    • A program is generated for each dataset to create harmonized variables.
    • Data is saved on the household and individual level, in SPSS and then converted to STATA, to be disseminated.

    Response rate

    Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).

  12. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  13. f

    Table_1_Bayesian, Likelihood-Free Modelling of Phenotypic Plasticity and...

    • frontiersin.figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joao A.N. Filipe; Ilias Kyriazakis (2023). Table_1_Bayesian, Likelihood-Free Modelling of Phenotypic Plasticity and Variability in Individuals and Populations.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00727.s009
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Joao A.N. Filipe; Ilias Kyriazakis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is a paradigm shift from the traditional focus on the “average” individual towards the definition and analysis of trait variation within individual life-history and among individuals in populations. This is a result of increasing availability of individual phenotypic data. The shift allows the use of genetic and environment-driven variations to assess robustness to challenge, gain greater understanding of organismal biological processes, or deliver individual-targeted treatments or genetic selection. These consequences apply, in particular, to variation in ontogenetic growth. We propose an approach to parameterise mathematical models of individual traits (e.g., reaction norms, growth curves) that address two challenges: 1) Estimation of individual traits while making minimal assumptions about data distribution and correlation, addressed via Approximate Bayesian Computation (a form of nonparametric inference). We are motivated by the fact that available information on distribution of biological data is often less precise than assumed by conventional likelihood functions. 2) Scaling-up to population phenotype distributions while facilitating unbiased use of individual data; this is addressed via a probabilistic framework where population distributions build on separately-inferred individual distributions and individual-trait interpretability is preserved. The approach is tested against Bayesian likelihood-based inference, by fitting weight and energy intake growth models to animal data and normal- and skewed-distributed simulated data. i) Individual inferences were accurate and robust to changes in data distribution and sample size; in particular, median-based predictions were more robust than maximum- likelihood-based curves. These results suggest that the approach gives reliable inferences using few observations and monitoring resources. ii) At the population level, each individual contributed via a specific data distribution, and population phenotype estimates were not disproportionally influenced by outlier individuals. Indices measuring population phenotype variation can be derived for study comparisons. The approach offers an alternative for estimating trait variability in biological systems that may be reliable for various applications, for example, in genetics, health, and individualised nutrition, while using fewer assumptions and fewer empirical observations. In livestock breeding, the potentially greater accuracy of trait estimation (without specification of multitrait variance-covariance parameters) could lead to improved selection and to more decisive estimates of trait heritability.

  14. Data from: RESEARCH METHODOLOGY FOR NOVELTY TECHNOLOGY

    • scielo.figshare.com
    • search.datacite.org
    jpeg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P.C. Lai (2023). RESEARCH METHODOLOGY FOR NOVELTY TECHNOLOGY [Dataset]. http://doi.org/10.6084/m9.figshare.7482734.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    P.C. Lai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This paper contributes to the existing literature by reviewing the research methodology and the literature review with the focus on potential applications for the novelty technology of the single platform E-payment. These included, but were not restricted to the subjects, population, sample size requirement, data collection method and measurement of variables, pilot study and statistical techniques for data analysis. The reviews will shed some light and potential applications for future researchers, students and others to conceptualize, operationalize and analyze the underlying research methodology to assist in the development of their research methodology.

  15. Data from: LINKAGES: An Individual-based Forest Ecosystem Biogeochemistry...

    • catalog.data.gov
    • gimi9.com
    • +6more
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ORNL_DAAC (2025). LINKAGES: An Individual-based Forest Ecosystem Biogeochemistry Model [Dataset]. https://catalog.data.gov/dataset/linkages-an-individual-based-forest-ecosystem-biogeochemistry-model-fcca9
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Oak Ridge National Laboratory Distributed Active Archive Center
    Description

    This model product contains the source codes for version 1 of the individual-based forest ecosystem biogeochemistry model LINKAGES and two subsequent versions as well as example input and output data. LINKAGES predicts long-term structure and dynamics of forest ecosystems as constrained by nitrogen availability, climate, and soil moisture. Model simulations compare favorably to field data from different geographic areas worldwide. LINKAGES, written in FORTRAN and provided in ASCII format, simulates birth, growth, and death of all trees greater than 1.43-cm dbh. Litter fall and decomposition are also simulated. Sunlight is the driving variable. Growing season degree days, soil water availability, and AET are calculated from precipitation, temperature, soil field moisture capacity, and wilting point. Decomposition and soil N availability are calculated from organic matter quantity and carbon chemistry, evapotranspiration, and degree of canopy closure. Light availability to each tree is a function of leaf biomass of taller trees. Degree days and availabilities of light and water constrain species reproduction. These variables plus soil N constrain tree growth and carbon accumulation in biomass. Tree death probability increases with age and slow growth. Leaf, root, and woody litter are returned to the soil at the end of each year to decay the following year. Climatic and forest data for eastern North America and New South Wales are provided as example model inputs. Modelers may use their own site data within any version of LINKAGES. Example model output is also provided.

  16. C

    China CN: No of Household: One Person: Hunan

    • ceicdata.com
    Updated Dec 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2019). China CN: No of Household: One Person: Hunan [Dataset]. https://www.ceicdata.com/en/china/population-sample-survey-no-of-household-one-person
    Explore at:
    Dataset updated
    Dec 23, 2019
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2012 - Dec 1, 2023
    Area covered
    China
    Variables measured
    Population
    Description

    CN: No of Household: One Person: Hunan data was reported at 3.680 Unit th in 2023. This records an increase from the previous number of 3.113 Unit th for 2022. CN: No of Household: One Person: Hunan data is updated yearly, averaging 2.272 Unit th from Dec 2002 (Median) to 2023, with 21 observations. The data reached an all-time high of 5,796.689 Unit th in 2020 and a record low of 1.320 Unit th in 2009. CN: No of Household: One Person: Hunan data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Socio-Demographic – Table CN.GA: Population: Sample Survey: No of Household: One Person.

  17. H

    pyhydroqc Sensor Data QC: Single Site Example

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2022). pyhydroqc Sensor Data QC: Single Site Example [Dataset]. http://doi.org/10.4211/hs.92f393cbd06b47c398bdd2bbb86887ac
    Explore at:
    zip(1.5 MB)Available download formats
    Dataset updated
    Mar 8, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2017 - Dec 31, 2017
    Area covered
    Description

    This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.

    This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.

    Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction. - Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable. - Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables. - Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest. - Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.

    The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.

    The anomaly detection and correction workflow involves the following steps: 1. Retrieving data 2. Applying rules-based detection to screen data and apply initial corrections 3. Identifying and correcting sensor drift and calibration (if applicable) 4. Developing a model (i.e., ARIMA or LSTM) 5. Applying model to make time series predictions 6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results 7. Widening the window over which an anomaly is identified 8. Aggregating detections resulting from multiple models 9. Making corrections for anomalous events

    Instructions to run the notebook through the CUAHSI JupyterHub: 1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials. 2. Select 'Python 3.8 - Scientific' as the server and click Start. 2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file. 3. Execute each cell in the code by clicking the Run button.

  18. p

    High Frequency Phone Survey, Continuous Data Collection 2023 - Papua New...

    • microdata.pacificdata.org
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Seitz (2025). High Frequency Phone Survey, Continuous Data Collection 2023 - Papua New Guinea [Dataset]. https://microdata.pacificdata.org/index.php/catalog/877
    Explore at:
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    William Seitz
    Darian Naidoo
    Time period covered
    2023 - 2025
    Area covered
    Papua New Guinea
    Description

    Abstract

    Access to up-to-date socio-economic data is a widespread challenge in Papua New Guinea and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.

    For PNG, after five rounds of data collection from 2020-2022, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. This followed an initial pilot of the data collection from January 2023-March 2023. Data for April 2023-September 2023 were a repeated cross section, while October 2023 established the first month of a panel, which is ongoing as of March 2025. For each month, approximately 550-1000 households were interviewed. The sample is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in PNG. There is one date file for household level data with a unique household ID, and separate files for individual level data within each household data, and household food price data, that can be matched to the household file using the household ID. A unique individual ID within the household data which can be used to track individuals over time within households.

    Geographic coverage

    Urban and rural areas of Papua New Guinea

    Analysis unit

    Household, Individual

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification from a large random sample of Digicel’s subscribers. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The resulting overall sample has a probability-based weighted design, with a proportionate stratification to achieve a proper geographical representation. More information on sampling for the cross-sectional monthly sample can be found in previous documentation for the PNG HFPS data.

    A monthly panel was established in October 2023, that is ongoing as of March 2025. In each subsequent round of data collection after October 2024, the survey firm would first attempt to contact all households from the previous month, and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    he questionnaire, which can be found in the External Resources of this documentation, is in English with a Pidgin translation.

    The survey instrument for Q1 2025 consists of the following modules: -1. Basic Household information, -2. Household Roster, -3. Labor, -4a Food security, -4b Food prices -5. Household income, -6. Agriculture, -8. Access to services, -9. Assets -10. Wellbeing and shocks -10a. WASH

    Cleaning operations

    The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.

  19. Afrobarometer Survey 2022 - Guinea

    • datacatalog.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Jun 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Cape Town (UCT, South Africa) (2025). Afrobarometer Survey 2022 - Guinea [Dataset]. https://datacatalog.ihsn.org/catalog/study/GIN_2022_AFB-R9_v01_M
    Explore at:
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Institute for Justice and Reconciliationhttp://www.ijr.org.za/
    Michigan State University (MSU)
    Institute for Empirical Research in Political Economy (IREEP)
    Ghana Centre for Democratic Development (CDD)
    Institute for Development Studies (IDS)
    University of Cape Town (UCT, South Africa)
    Time period covered
    2022
    Area covered
    Guinea
    Description

    Abstract

    The Afrobarometer is a comparative series of public attitude surveys that assess African citizen's attitudes to democracy and governance, markets, and civil society, among other topics. The surveys have been undertaken at periodic intervals since 1999. The Afrobarometer's coverage has increased over time. Round 1 (1999-2001) initially covered 7 countries and was later extended to 12 countries. Round 2 (2002-2004) surveyed citizens in 16 countries. Round 3 (2005-2006) 18 countries, Round 4 (2008) 20 countries, Round 5 (2011-2013) 34 countries, Round 6 (2014-2015) 36 countries, Round 7 (2016-2018) 34 countries, and Round 8 (2019-2021). The survey covered 39 countries in Round 9 (2021-2023).

    Geographic coverage

    National coverage

    Analysis unit

    Individual

    Universe

    Citizens of Guinea who are 18 years and older

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:

    • using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.

    The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.

    Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.

    The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.

    Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.

    Sample stages Samples are drawn in either four or five stages:

    Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewer alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.

    Guinea - Sample size: 1,200 - Sample design: Nationally representative, random, clustered, stratified, multi-stage area probability sample - Stratification: Region and urban-rural location - Stages: PSUs (from strata), start points, households, respondents - PSU selection: Probability Proportionate to Population Size (PPPS) - Cluster size: 8 households per PSU - Household selection: Randomly selected start points, followed by walk pattern using 5/10 interval - Respondent selection: Gender quota filled by alternating interviews between men and women; respondents of appropriate gender listed, after which computer randomly selects individual - Weighting: Weighted to account for individual selection probabilities - Sampling frame: Base de sondages de 2014 mise à jour du Recensement Général de la Population et de l’Habitat (RGPH)

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The Round 9 questionnaire has been developed by the Questionnaire Committee after reviewing the findings and feedback obtained in previous Rounds, and securing input on preferred new topics from a host of donors, analysts, and users of the data.

    The questionnaire consists of three parts: 1. Part 1 captures the steps for selecting households and respondents, and includes the introduction to the respondent and (pp.1-4). This section should be filled in by the Fieldworker. 2. Part 2 covers the core attitudinal and demographic questions that are asked by the Fieldworker and answered by the Respondent (Q1 – Q100). 3. Part 3 includes contextual questions about the setting and atmosphere of the interview, and collects information on the Fieldworker. This section is completed by the Fieldworker (Q101 – Q123).

    Response rate

    Response rate was 97%.

    Sampling error estimates

    The sample size yields country-level results with a margin of error of +/-3 percentage points at a 95% confidence level.

  20. p

    High Frequency Phone Survey, Continuous Data Collection 2023 - Solomon...

    • microdata.pacificdata.org
    Updated Mar 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darian Naidoo and William Seitz (2025). High Frequency Phone Survey, Continuous Data Collection 2023 - Solomon Islands [Dataset]. https://microdata.pacificdata.org/index.php/catalog/875
    Explore at:
    Dataset updated
    Mar 19, 2025
    Dataset authored and provided by
    Darian Naidoo and William Seitz
    Time period covered
    2023 - 2024
    Area covered
    Solomon Islands
    Description

    Abstract

    Access to up-to-date socio-economic data is a widespread challenge in Solomon Islands and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.

    For Solmon Islands, after five rounds of data collection from 2020-2020, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. Fieldwork took place in two non-consecutive weeks of each month. Data for April 2023-December 2023 were a repeated cross section, while January 2024 established the first month of a panel, the was continued to September 2024. Each month has approximately 550 households in the sample and is representative of urban and rural areas, but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Solomon Islands. There is one date file for household level data with a unique household ID. and a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.

    Geographic coverage

    Urban and rural areas of Solomon Islands.

    Analysis unit

    Household, individual.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The initial sample was drawn from information provided by a major phone service provider in Solomon Islands, covering all the provinces in the country. It had a probability-based weighted design, with a proportionate stratification to achieve geographical representation. The geographical distribution compared to the 2019 Census is listed below for the first month of the HFPS monthly survey:

    Choiseul : Census: 4.3%, HFPS: 5.2% Western : Census: 14.4%, HFPS: 13.7% Isabel : Census: 4.8%, HFPS: 4.7% Central : Census: 3.6%, HFPS: 5.2% Ren Bell : Census: 0.6%, HFPS: 1.4% Guadalcanal: Census: 19.8%, HFPS: 21.1% Malaita : Census: 23.1%, HFPS: 18.7% Makira : Census: 5.6%, HFPS: 5.6% Temotu: Census: 3.0%, HFPS: 3% Honiara: Census: 20.7%, HFPS: 21.3%

    Source: Census of Population and Housing 2019

    Note: The values in the HFPS column represent the proportion of survey participants residing in each province, based on the raw HFPS data from April.

    In April 2023, the geographic distribution of World Bank HFPS participants was generally similar to that of the census data at the province level, though within provinces, areas with less mobile phone connectivity are likely to be underrepresented. One indication of this is that urban areas constituted 38.2 percent of the survey sample, which is a slight overrepresentation, compared to 32.5 percent in the Census 2019.

    A monthly panel was established in January 2024, that is ongoing as of March 2025. In each subsequent month after January 2024, the survey firm would first attempt to contact all households from the previous month and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households. Across all months of the survey a total of, 9,926 interviews were completed.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The questionnaire, which can be found in the External Resources of this documentation, is available in English, with Solomons Pijin translation. There were few changes to the questionnaire across the survey months, but some sections were only introduced in 2024, namely energy access questions and questions to inform the baseline data of the Solomon Islands Government Integrated Economic Development and Climate Resilience (IEDCR) project.

    Cleaning operations

    The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 9,926 in the household dataset and 62,054 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
McGinty, Ryan C.; Fukagawa, Naomi K.; Couture, Garret; Phillips, Katherine M.; Pehrsson, Pamela R.; McKillop, Kyle (2021). Data for individual samples. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000868830

Data for individual samples.

Explore at:
Dataset updated
Jul 8, 2021
Authors
McGinty, Ryan C.; Fukagawa, Naomi K.; Couture, Garret; Phillips, Katherine M.; Pehrsson, Pamela R.; McKillop, Kyle
Description

Results for analyzed components in individual banana samples. (XLSX)

Search
Clear search
Close search
Google apps
Main menu