100+ datasets found
  1. Dataset for: Sample size estimation for case-crossover studies

    • wiley.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sai Dharmarajan; Joo Yeon Lee; Rima Izem (2023). Dataset for: Sample size estimation for case-crossover studies [Dataset]. http://doi.org/10.6084/m9.figshare.7228559.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Sai Dharmarajan; Joo Yeon Lee; Rima Izem
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Case-crossover study designs are observational studies used to assess post-market safety of medical products (e.g. vaccines or drugs). As a case-crossover study is self-controlled, its advantages include better control for confounding because the design controls for any time-invariant measured and unmeasured confounding, and potentially greater feasibility as only data from those experiencing an event (or cases) is required. However, self-matching also introduces correlation between case and control periods within a subject or matched unit. To estimate sample size in a case-crossover study, investigators currently use Dupont’s formula (Biometrics 1988; 43:1157- 1168), which was originally developed for a matched case-control study. This formula is relevant as it takes into account correlation in exposure between controls and cases which are expected to be high in self-controlled studies. However, in our study, we show that Dupont’s formula and other currently used methods to determine sample size for case-crossover studies may be inadequate. Specifically, these formulae tend to underestimate the true required sample size, determined through simulations, for a range of values in the parameter space. We present mathematical derivations to explain where some currently used methods fail and propose two new sample size estimation methods that provide a more accurate estimate of the true required sample size.

  2. d

    Calculating Sample Size for the NYTD Follow-up Population

    • catalog.data.gov
    • data.virginia.gov
    Updated Sep 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Calculating Sample Size for the NYTD Follow-up Population [Dataset]. https://catalog.data.gov/dataset/calculating-sample-size-for-the-nytd-follow-up-population
    Explore at:
    Dataset updated
    Sep 8, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This brief provides more information about a how a State may, for planning purposes, calculate a sample size for the NYTD follow-up population. Metadata-only record linking to the original dataset. Open original dataset below.

  3. d

    Data from: Sample size requirements for case-control study designs

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (2025). Sample size requirements for case-control study designs [Dataset]. https://catalog.data.gov/dataset/sample-size-requirements-for-case-control-study-designs
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    National Institutes of Health
    Description

    Background Published formulas for case-control designs provide sample sizes required to determine that a given disease-exposure odds ratio is significantly different from one, adjusting for a potential confounder and possible interaction. Results The formulas are extended from one control per case to F controls per case and adjusted for a potential multi-category confounder in unmatched or matched designs. Interactive FORTRAN programs are described which compute the formulas. The effect of potential disease-exposure-confounder interaction may be explored. Conclusions Software is now available for computing adjusted sample sizes for case-control designs.

  4. Machine learning algorithm validation with a limited sample size

    • plos.figshare.com
    text/x-python
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson (2023). Machine learning algorithm validation with a limited sample size [Dataset]. http://doi.org/10.1371/journal.pone.0224365
    Explore at:
    text/x-pythonAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J. Casson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Advances in neuroimaging, genomic, motion tracking, eye-tracking and many other technology-based data collection methods have led to a torrent of high dimensional datasets, which commonly have a small number of samples because of the intrinsic high cost of data collection involving human participants. High dimensional data with a small number of samples is of critical importance for identifying biomarkers and conducting feasibility and pilot work, however it can lead to biased machine learning (ML) performance estimates. Our review of studies which have applied ML to predict autistic from non-autistic individuals showed that small sample size is associated with higher reported classification accuracy. Thus, we have investigated whether this bias could be caused by the use of validation methods which do not sufficiently control overfitting. Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size. We also show that feature selection if performed on pooled training and testing data is contributing to bias considerably more than parameter tuning. In addition, the contribution to bias by data dimensionality, hyper-parameter space and number of CV folds was explored, and validation methods were compared with discriminable data. The results suggest how to design robust testing methodologies when working with small datasets and how to interpret the results of other studies based on what validation method was used.

  5. R

    Data from: Replication Dataset for "Sample size requirements for riverbank...

    • data.4tu.nl
    • figshare.com
    zip
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sjoukje de Lange; Yvette Mellink; Paul Vriend; Paolo Tasseron; Tim van Emmerik (2022). Replication Dataset for "Sample size requirements for riverbank macrolitter characterization" [Dataset]. http://doi.org/10.4121/19188131.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Sjoukje de Lange; Yvette Mellink; Paul Vriend; Paolo Tasseron; Tim van Emmerik
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    2021
    Area covered
    Netherlands
    Description

    This is the replication dataset corresponding to the publication "Sample size requirements for riverbank macrolitter characterization". We refer to this publication for the data description. Additionally the document contents_data_publication.pdf will provide an overview of the contents of this database.

  6. d

    Sample Size and Population Estimates Tables (Standard Errors and P Values) -...

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Sample Size and Population Estimates Tables (Standard Errors and P Values) - 8.1 to 8.13 [Dataset]. https://catalog.data.gov/dataset/sample-size-and-population-estimates-tables-standard-errors-and-p-values-8-1-to-8-13
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administration
    Description

    These detailed tables show standard errors for sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH). Standard errors for samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, geographic area, pregnancy status, college enrollment status, and probation/parole status.

  7. Sample Size and Population Estimates Tables (Prevalence Estimates) - 3.1 to...

    • data.virginia.gov
    • healthdata.gov
    • +1more
    html
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Sample Size and Population Estimates Tables (Prevalence Estimates) - 3.1 to 3.8 [Dataset]. https://data.virginia.gov/dataset/sample-size-and-population-estimates-tables-prevalence-estimates-3-1-to-3-81
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    These detailed tables show sample sizes and population estimates pertaining to mental health from the 2010 National Survey on Drug Use and Health (NSDUH). Samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, poverty level, geographic area, insurance status.

  8. f

    Data collection techniques, study participants and sample size.

    • datasetcatalog.nlm.nih.gov
    Updated Apr 25, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Desmond, Nicola; Choko, Augustine; Corbett, Elizabeth L.; Chipungu, Geoffrey A.; Nyirenda, Deborah; Hart, Graham; Chikovore, Jeremiah; Shand, Tim; Kumwenda, Moses (2016). Data collection techniques, study participants and sample size. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001546119
    Explore at:
    Dataset updated
    Apr 25, 2016
    Authors
    Desmond, Nicola; Choko, Augustine; Corbett, Elizabeth L.; Chipungu, Geoffrey A.; Nyirenda, Deborah; Hart, Graham; Chikovore, Jeremiah; Shand, Tim; Kumwenda, Moses
    Description

    Data collection techniques, study participants and sample size.

  9. Ruler Sample Size

    • kaggle.com
    zip
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatag Suryo Pambudi (2023). Ruler Sample Size [Dataset]. https://www.kaggle.com/datasets/tatagsuryopambudi/ruler-sample-size
    Explore at:
    zip(1158 bytes)Available download formats
    Dataset updated
    Jun 22, 2023
    Authors
    Tatag Suryo Pambudi
    Description

    Dataset

    This dataset was created by Tatag Suryo Pambudi

    Contents

  10. Sample Size and Population Estimates Tables (Standard Errors and P Values) -...

    • healthdata.gov
    csv, xlsx, xml
    Updated Sep 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Sample Size and Population Estimates Tables (Standard Errors and P Values) - 8.1 to 8.13 - b8d9-g93r - Archive Repository [Dataset]. https://healthdata.gov/dataset/Sample-Size-and-Population-Estimates-Tables-Standa/h5u7-3c5e
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Sep 16, 2025
    Description

    This dataset tracks the updates made on the dataset "Sample Size and Population Estimates Tables (Standard Errors and P Values) - 8.1 to 8.13" as a repository for previous versions of the data and metadata.

  11. f

    Activity, sample size, study-site contributing data, and age ranges for each...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Jun 23, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Puyau, Maurice R.; McMurray, Robert G.; Butte, Nancy F.; Pfeiffer, Karin A.; Fulton, Janet E.; Bassett, David R.; Watson, Kathleen B.; Crouter, Scott E.; Trost, Stewart G.; Berrigan, David (2015). Activity, sample size, study-site contributing data, and age ranges for each of the activities examined in this study [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001931311
    Explore at:
    Dataset updated
    Jun 23, 2015
    Authors
    Puyau, Maurice R.; McMurray, Robert G.; Butte, Nancy F.; Pfeiffer, Karin A.; Fulton, Janet E.; Bassett, David R.; Watson, Kathleen B.; Crouter, Scott E.; Trost, Stewart G.; Berrigan, David
    Description

    *B = Baylor, M = Massachusetts—Boston, MS = Michigan State, NC = North Carolina, O = Oregon StateActivity, sample size, study-site contributing data, and age ranges for each of the activities examined in this study

  12. Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to...

    • healthdata.gov
    csv, xlsx, xml
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to 8.13 - hcuy-kgx2 - Archive Repository [Dataset]. https://healthdata.gov/dataset/Sample-Size-and-Population-Estimates-Tables-Preval/kubt-gbwq
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Sep 18, 2025
    Description

    This dataset tracks the updates made on the dataset "Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to 8.13" as a repository for previous versions of the data and metadata.

  13. f

    DHS datasets and sample size.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kachoria, Aparna G.; Mubarak, Mohammad Yousuf; Shah, Saleh; Somers, Rachael; Singh, Awnish K.; Wagner, Abram L. (2022). DHS datasets and sample size. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000362956
    Explore at:
    Dataset updated
    Jul 12, 2022
    Authors
    Kachoria, Aparna G.; Mubarak, Mohammad Yousuf; Shah, Saleh; Somers, Rachael; Singh, Awnish K.; Wagner, Abram L.
    Description

    DHS datasets and sample size.

  14. Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to 8.13 [Dataset]. https://catalog.data.gov/dataset/sample-size-and-population-estimates-tables-prevalence-estimates-8-1-to-8-13-895aa
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    These detailed tables show sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH). Samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, geographic area, pregnancy status, college enrollment status, and probation/parole status.

  15. f

    Data Sheet 1_GSD-SSR: an integrated framework for power analysis in IRB...

    • frontiersin.figshare.com
    docx
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxi Zhu; Shi Du; Peihe Xia (2025). Data Sheet 1_GSD-SSR: an integrated framework for power analysis in IRB proposals using group sequential design and sample size re-estimation.docx [Dataset]. http://doi.org/10.3389/fams.2025.1611205.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset provided by
    Frontiers
    Authors
    Yuxi Zhu; Shi Du; Peihe Xia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accurate sample size estimation is a cornerstone of successful Institutional Review Board (IRB) proposals, as it establishes the feasibility of clinical studies and ensures they are sufficiently powered to detect meaningful effects. Underestimating sample size poses the risk of insufficient statistical power, compromising the ability to identify significant outcomes. Conversely, overestimating sample size can lead to prolonged data collection, wasting valuable time and resources. One of the primary challenges in sample size estimation lies in the uncertainty surrounding variance and effect size before the study begins. Group Sequential Design with Sample Size Re-estimation (GSD-SSR) effectively addresses this issue by utilizing interim data at predefined stages to refine these estimates. GSD-SSR enables dynamic adjustments to sample size during the study, optimizing resource allocation and improving overall efficiency. We offer a comprehensive introduction to the theoretical background of GSD-SSR and provide step-by-step guidance for its practical application in clinical research. To further facilitate adoption, we have developed a user-friendly online platform that streamlines the GSD-SSR process and integrates it seamlessly into IRB proposals. By incorporating GSD-SSR into the power analysis of IRB proposals, researchers can significantly increase the likelihood of successful clinical studies while enhancing budget efficiency and optimizing timelines.

  16. Sample size calculation and random review generator

    • figshare.com
    xlsx
    Updated Feb 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kieran Shah (2016). Sample size calculation and random review generator [Dataset]. http://doi.org/10.6084/m9.figshare.2324971.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 18, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Kieran Shah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample size calculation per Cochrane review group; random review # generator (used to help pick reviews at random)

  17. f

    Summary of compiled dataset sample sizes, location of lakes, and collection...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santucci Jr. , Victor J.; Jones, Thomas S.; Ahrenstorff, Tyler D.; Lawson, Zach J.; McInerny, Michael C.; Gaeta, Jereme W.; Vander Zanden, M. Jake; Fetzer, William W.; Diana, James S. (2018). Summary of compiled dataset sample sizes, location of lakes, and collection years. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000700106
    Explore at:
    Dataset updated
    Mar 15, 2018
    Authors
    Santucci Jr. , Victor J.; Jones, Thomas S.; Ahrenstorff, Tyler D.; Lawson, Zach J.; McInerny, Michael C.; Gaeta, Jereme W.; Vander Zanden, M. Jake; Fetzer, William W.; Diana, James S.
    Description

    Summary of compiled dataset sample sizes, location of lakes, and collection years.

  18. Sample Size and Population Estimate Tables (Prevalence Estimates) - 3.1 to...

    • data.virginia.gov
    • healthdata.gov
    • +1more
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Sample Size and Population Estimate Tables (Prevalence Estimates) - 3.1 to 3.8 [Dataset]. https://data.virginia.gov/dataset/sample-size-and-population-estimate-tables-prevalence-estimates-3-1-to-3-8
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Description

    These detailed tables show sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH) Mental Health Detailed Tables. Samples sizes and population estimates are provided age group, gender, race/ethnicity, education level, employment status, county type, poverty level, insurance status, overal health, and geographic area.

  19. w

    Synthetic Data for an Imaginary Country, Sample, 2023 - World

    • microdata.worldbank.org
    • nada-demo.ihsn.org
    Updated Jul 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Data Group, Data Analytics Unit (2023). Synthetic Data for an Imaginary Country, Sample, 2023 - World [Dataset]. https://microdata.worldbank.org/index.php/catalog/5906
    Explore at:
    Dataset updated
    Jul 7, 2023
    Dataset authored and provided by
    Development Data Group, Data Analytics Unit
    Time period covered
    2023
    Area covered
    World
    Description

    Abstract

    The dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.

    The full-population dataset (with about 10 million individuals) is also distributed as open data.

    Geographic coverage

    The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.

    Analysis unit

    Household, Individual

    Universe

    The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.

    Kind of data

    ssd

    Sampling procedure

    The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.

    Mode of data collection

    other

    Research instrument

    The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.

    Cleaning operations

    The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.

    Response rate

    This is a synthetic dataset; the "response rate" is 100%.

  20. Synthetic dataset of Luxembourg citizens

    • kaggle.com
    zip
    Updated Jan 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olaf Yunus Laitinen Imanov (2025). Synthetic dataset of Luxembourg citizens [Dataset]. https://www.kaggle.com/datasets/olaflundstrom/synthetic-dataset-of-luxembourg-citizens
    Explore at:
    zip(3016850 bytes)Available download formats
    Dataset updated
    Jan 21, 2025
    Authors
    Olaf Yunus Laitinen Imanov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Luxembourg
    Description

    The dataset has been created by using the open-source code released by LNDS (Luxembourg National Data Service). It is meant to be an example of the dataset structure anyone can generate and personalize in terms of some fixed parameter, including the sample size. The file format is .csv, and the data are organized by individual profiles on the rows and their personal features on the columns. The information in the dataset has been generated based on the statistical information about the age-structure distribution, the number of populations over municipalities, the number of different nationalities present in Luxembourg, and salary statistics per municipality. The STATEC platform, the statistics portal of Luxembourg, is the public source we used to gather the real information that we ingested into our synthetic generation model. Other features like Date of birth, Social matricule, First name, Surname, Ethnicity, and physical attributes have been obtained by a logical relationship between variables without exploiting any additional real information. We are in compliance with the law in putting close to zero the risk of identifying a real person completely by chance.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sai Dharmarajan; Joo Yeon Lee; Rima Izem (2023). Dataset for: Sample size estimation for case-crossover studies [Dataset]. http://doi.org/10.6084/m9.figshare.7228559.v1
Organization logo

Dataset for: Sample size estimation for case-crossover studies

Related Article
Explore at:
docxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Sai Dharmarajan; Joo Yeon Lee; Rima Izem
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Case-crossover study designs are observational studies used to assess post-market safety of medical products (e.g. vaccines or drugs). As a case-crossover study is self-controlled, its advantages include better control for confounding because the design controls for any time-invariant measured and unmeasured confounding, and potentially greater feasibility as only data from those experiencing an event (or cases) is required. However, self-matching also introduces correlation between case and control periods within a subject or matched unit. To estimate sample size in a case-crossover study, investigators currently use Dupont’s formula (Biometrics 1988; 43:1157- 1168), which was originally developed for a matched case-control study. This formula is relevant as it takes into account correlation in exposure between controls and cases which are expected to be high in self-controlled studies. However, in our study, we show that Dupont’s formula and other currently used methods to determine sample size for case-crossover studies may be inadequate. Specifically, these formulae tend to underestimate the true required sample size, determined through simulations, for a range of values in the parameter space. We present mathematical derivations to explain where some currently used methods fail and propose two new sample size estimation methods that provide a more accurate estimate of the true required sample size.

Search
Clear search
Close search
Google apps
Main menu