81 datasets found
  1. Confidence Interval Examples

    • figshare.com
    application/cdfv2
    Updated Jun 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Rollinson (2016). Confidence Interval Examples [Dataset]. http://doi.org/10.6084/m9.figshare.3466364.v2
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 28, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Emily Rollinson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.

  2. f

    Data from: A Statistical Inference Course Based on p-Values

    • figshare.com
    • tandf.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Martin (2023). A Statistical Inference Course Based on p-Values [Dataset]. http://doi.org/10.6084/m9.figshare.3494549.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Ryan Martin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.

  3. G

    Interval Data Validation and Estimation Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Interval Data Validation and Estimation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/interval-data-validation-and-estimation-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Interval Data Validation and Estimation Tools Market Outlook




    According to our latest research, the global Interval Data Validation and Estimation Tools market size reached USD 1.46 billion in 2024. With a robust compound annual growth rate (CAGR) of 11.2% projected over the forecast period, the market is expected to reach USD 3.73 billion by 2033. This growth is primarily driven by the rising demand for advanced data quality assurance and analytics solutions across sectors such as BFSI, healthcare, manufacturing, and IT & telecommunications. As organizations increasingly rely on accurate interval data for operational efficiency and regulatory compliance, the adoption of validation and estimation tools continues to surge.




    A key factor propelling the growth of the Interval Data Validation and Estimation Tools market is the exponential rise in data generation from connected devices, IoT sensors, and digital platforms. Businesses today are inundated with massive volumes of interval data, which, if not validated and accurately estimated, can lead to significant operational inefficiencies and decision-making errors. These tools play a crucial role in ensuring the integrity, accuracy, and completeness of interval data, thereby enabling organizations to derive actionable insights and maintain competitive advantage. Furthermore, the growing emphasis on automation and digital transformation initiatives is pushing enterprises to invest in sophisticated data validation and estimation solutions, further accelerating market growth.




    Another major growth driver is the increasing stringency of regulatory requirements across industries, particularly in sectors such as BFSI, healthcare, and utilities. Regulations related to data governance, privacy, and reporting demand organizations to maintain high standards of data quality and compliance. Interval Data Validation and Estimation Tools help organizations adhere to these regulatory mandates by providing automated checks, anomaly detection, and robust audit trails. The integration of artificial intelligence and machine learning into these tools is further enhancing their capabilities, enabling real-time data validation and predictive estimation, which is critical in fast-paced business environments.




    Additionally, the surge in cloud adoption and the proliferation of cloud-based data management platforms are significantly contributing to the market’s expansion. Cloud-based deployment models offer scalability, flexibility, and cost-efficiency, making advanced validation and estimation tools accessible to small and medium-sized enterprises as well as large organizations. The ability to seamlessly integrate with existing data architectures and third-party applications is also a key factor driving the adoption of both on-premises and cloud-based solutions. As data ecosystems become increasingly complex and distributed, the demand for interval data validation and estimation tools is expected to witness sustained growth through 2033.




    From a regional perspective, North America currently holds the largest share of the Interval Data Validation and Estimation Tools market, driven by early technology adoption, a strong focus on data-driven decision-making, and a mature regulatory landscape. However, Asia Pacific is anticipated to register the fastest CAGR of 13.5% during the forecast period, fueled by rapid digitalization, expanding industrialization, and increasing investments in smart infrastructure. Europe and Latin America are also witnessing steady growth, supported by government initiatives and the rising importance of data quality management in emerging economies. The Middle East & Africa region, though comparatively nascent, is expected to demonstrate significant potential as digital transformation initiatives gain momentum.





    Component Analysis




    The Interval Data Validation and Estimation Tools market by component is broadly segmented into Software and Servic

  4. G

    Interval Data Analytics Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Interval Data Analytics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/interval-data-analytics-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Interval Data Analytics Market Outlook



    According to our latest research, the global Interval Data Analytics market size reached USD 3.42 billion in 2024, demonstrating robust growth across key verticals. The market is expected to advance at a CAGR of 13.8% from 2025 to 2033, leading to a projected value of USD 10.13 billion by 2033. This impressive expansion is primarily driven by rising demand for advanced analytics solutions capable of processing time-stamped and interval-based data, especially as organizations seek to optimize operations, enhance predictive capabilities, and comply with evolving regulatory requirements.



    One of the most significant growth factors propelling the Interval Data Analytics market is the exponential increase in data generation from IoT devices, smart meters, and connected infrastructure across industries. Organizations in sectors such as utilities, manufacturing, and healthcare are increasingly reliant on interval data for resource optimization, real-time monitoring, and predictive maintenance. The ability of interval data analytics to handle vast amounts of granular, time-series data enables businesses to uncover actionable insights, reduce operational costs, and improve asset utilization. Additionally, the growing adoption of smart grids and intelligent energy management systems further amplifies the need for sophisticated interval data analytics solutions that support real-time decision-making and regulatory compliance.



    Another pivotal driver for the Interval Data Analytics market is the rapid digital transformation and integration of artificial intelligence (AI) and machine learning (ML) technologies into analytics platforms. These advancements allow for more accurate forecasting, anomaly detection, and automated response mechanisms, which are critical in sectors like finance, healthcare, and telecommunications. As organizations continue to prioritize data-driven strategies, the demand for interval data analytics tools that can seamlessly integrate with existing IT ecosystems and provide scalable, cloud-based solutions is accelerating. Furthermore, the shift towards cloud computing and the proliferation of big data platforms are making it easier for enterprises of all sizes to deploy and scale interval data analytics capabilities, thus broadening the market's reach and potential.



    Regulatory pressures and the increasing need for transparency and accountability in data handling are also fueling the growth of the Interval Data Analytics market. Industries such as banking and financial services, healthcare, and energy are subject to stringent compliance requirements that necessitate precise monitoring and reporting of interval data. The ability of interval data analytics platforms to provide auditable, time-stamped records and support regulatory reporting is becoming a critical differentiator for vendors in this space. Moreover, as data privacy laws evolve and enforcement intensifies, organizations are investing in analytics solutions that offer robust security features, data lineage tracking, and comprehensive audit trails, further boosting market adoption.



    From a regional perspective, North America continues to lead the Interval Data Analytics market, driven by early technology adoption, a strong presence of leading analytics vendors, and substantial investments in digital infrastructure. However, the Asia Pacific region is rapidly emerging as a key growth engine, fueled by large-scale digitalization initiatives, expanding industrial automation, and increasing penetration of IoT devices. Europe also represents a significant market, underpinned by regulatory mandates and a mature industrial base. Latin America and the Middle East & Africa, while currently smaller in market share, are witnessing accelerated adoption as organizations in these regions recognize the value of interval data analytics in enhancing operational efficiency and competitiveness.





    Component Analysis



    The Interval Data Analytics market is segmented by component into software and services, each playing a distinct role in

  5. Winkler Interval score metric

    • kaggle.com
    Updated Dec 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl McBride Ellis (2023). Winkler Interval score metric [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/winkler-interval-score-metric
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carl McBride Ellis
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Model performance evaluation: The Mean Winkler Interval score (MWIS)

    We can assess the overall performance of a regression model that produces prediction intervals by using the mean Winkler Interval score [1,2,3] which, for an individual interval, is given by:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4051350%2Fe3bd94c6047815c0304b3851fc325a7c%2FWinkler_Interval_Score.png?generation=1700042360776825&alt=media" alt="">

    where \(y\) is the true value, \(u\) it the upper prediction interval, \(l\) is the lower prediction interval, and \(\alpha\) is (1-coverage). For example, for 90% coverage, \(\alpha = 0.1\). Note that the Winkler Interval score constitutes a proper scoring rule [2,3].

    Python code: Usage example

    Attach this dataset to a notebook, then:

    import sys
    sys.path.append('/kaggle/input/winkler-interval-score-metric/')
    import MWIS_metric
    help(MWIS_metric.score)
    
    MWIS,coverage = MWIS_metric.score(predictions["y_true"],predictions["lower"],predictions["upper"],alpha)
    print(f"Local MWI score   ",round(MWIS,3))
    print("Predictions coverage  ", round(coverage*100,1),"%")
    
  6. Melodic Intervals Size Statistics for the most commonly occurring intervals....

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shui' er Han; Janani Sundararajan; Daniel Liu Bowling; Jessica Lake; Dale Purves (2023). Melodic Intervals Size Statistics for the most commonly occurring intervals. (Independent – samples t-tests). [Dataset]. http://doi.org/10.1371/journal.pone.0020160.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shui' er Han; Janani Sundararajan; Daniel Liu Bowling; Jessica Lake; Dale Purves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics for the comparisons of the most commonly occurring melodic interval sizes in tone and non-tone language music databases; n1 and n2 refer to the sample sizes of tone and non-tone language music databases. (All comparisons were made with the two-tailed independent samples t-test, α-level adjusted using the Bonferroni method).

  7. Wind Generation Time Interval Exploration Data

    • data.ca.gov
    • data.cnra.ca.gov
    • +3more
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Energy Commission (2024). Wind Generation Time Interval Exploration Data [Dataset]. https://data.ca.gov/dataset/wind-generation-time-interval-exploration-data
    Explore at:
    zip, gpkg, gdb, arcgis geoservices rest api, kml, geojson, csv, html, xlsx, txtAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset authored and provided by
    California Energy Commissionhttp://www.energy.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set behind the Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period.



    The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.

    Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan.

    By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.



  8. w

    Family Health Survey 1991 - Belize

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jul 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Office (CS)) (2014). Family Health Survey 1991 - Belize [Dataset]. https://microdata.worldbank.org/index.php/catalog/978
    Explore at:
    Dataset updated
    Jul 8, 2014
    Dataset authored and provided by
    Central Statistical Office (CS))
    Time period covered
    1991 - 1999
    Area covered
    Belize
    Description

    Abstract

    Belize is one of the countries in Latin America that was not included in the World Fertility Survey, the Contraceptive Prevalence Survey project, or the Demographic and Health Survey program during the 1970's and 1980's. As a result, data on contraceptive prevalence and the use of maternal and child health services in Belize has been limited. The 1991 Family Health Survey was designed to provide health professionals and international donors with data to assess infant and child mortality, fertility, and the use of family planning and health services in Belize.

    The objectives of the 1991 Family Health Survey were to: - obtain national fertility estimates; - estimate levels of infant and child mortality; - estimate the percentage of mothers who breastfed their last child and duration of breastfeeding; - determine levels of knowledge and current use of contraceptives for a variety of social and demographic background variables and to determine the source where users obtain the methods they use; - determine reasons for nonuse of contraception and estimate the percentage of women who are at risk of an unplanned pregnancy and, thus, in need of family planning services; and - examine the use of maternal and child health services and immunization levels for children less than 5 years of age and to examine the prevalence and treatment of diarrhea and acute respiratory infections among these children.

    Geographic coverage

    National

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The 1991 Belize Family Health Survey was an area probability survey with two stages of selection. The sampling frame for the survey was the quick count of all households in the country conducted in 1990 by the Central Statistical Office in preparation for the 1991 census. Two strata, or domains, were sampled independently: urban areas and rural areas. In the first stage of selection for the urban domain, a systematic sample with a random start was used to select enumeration districts in the domain with probability of selection proportional to the number of households in each district. In the second stage of selection, households were chosen systematically using a constant sampling interval (4.2350) across all of the selected enumeration districts. The enumeration districts selected for the rural domain were the same as those that had been selected earlier for the 1990 Belize Household Expenditure Survey. The second stage selection of rural households was conducted the same way it was for the urban domain but used a constant sampling interval of 2.1363. In order to have a self-weighting geographic sample, 3,106 urban households and 1,871 rural households were selected for a total of 4,977 households.

    Only one woman aged 15-44 per household was selected for interview. Each respondent's probability of selection was inversely proportional to the number of eligible women in the household. Thus, weighting factors were applied to compensate for this unequal probability of selection. In the tables presented in this report, proportions and means are based on the weighted number of cases, but the unweighted numbers are shown.

    Mode of data collection

    Face-to-face [f2f]

    Response rate

    Of the 4,977 households selected, 4,566 households were visited. Overall, 8 percent of households could not be located, and 7 percent of the households were found to be vacant. Less than 3 percent of the households refused to be interviewed. Fifty-five percent of sample households includeed at least one woman aged 15-44. Complete interviews were obtained in 94 percent of the households that had an eligible respondent, for a total of 2,656 interviews. Interview completion rates did not vary by residence.

    Sampling error estimates

    The estimates for a sample survey are affected by two types of errors: (1) sampling error and (2) non-sampling error. Non-sampling error is the result of mistakes made in carrying out data collection and data processing, including the failure to locate and interview the right household, errors in the way questions are asked or understood, and data entry errors. Although quality control efforts were made during the implementation of the Family Health Survey to minimize this type of error, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling error is defined as the difference between the true value for any variable measured in a survey and the value estimated by the survey. Sampling error is a measure of the variability between all possible samples that could have been selected from the same population using the same sample design and size. For the entire population and for large subgroups, the Family Health Survey is large enough that the sampling error for most estimates is small. However, for small subgroups, sampling errors are larger and may affect the reliability of the estimates. Sampling error is usually measured in terms of the standard error for a particular statistic (mean, proportion, or ratio), which is the square root of the variance. The standard error can be used to calculate confidence intervals for estimated statistics. For example, the 95 percent confidence interval for a statistic is the estimated value plus or minus 1.96 times the standard error for the estimate.

    The standard errors of statistics estimated using a multistage cluster sample design, such as that used in the Family Health Survey, are more complex than are standard errors based on simple random samples, and they tend to be somewhat larger than the standard errors produced by a simple random sample. The increase in standard error due to using a multi-stage cluster design is referred to as the design effect, which is defined as the ratio between the variance for the estimate using the sample design that was used and the variance for the estimate that would result if a simple random sample had been used. Based on experience with similar surveys, the design effect generally falls in a range from 1.2 to 2.0 for most variables.

    Table E.1 of the Final Report presents examples of what the 95 percent confidence interval on an estimated proportion would be, under a variety of sample sizes, assuming a design effect of 1.6. It presents half-widths of the 95 percent confidence intervals corresponding to sample sizes, ranging from 25 to 3200 cases, and corresponding to estimated proportions ranging from .05/.95 to .50/.50. The formula used for calculating the half-width of the 95 percent confidence interval is:

    (half of 95% C.I.) = (1.96) SQRT {(1.6)(p)(1-p) / n},

    where p is the estimated proportion, n is the number of cases used in calculating the proportion, and 1.6 is the design effect. It can be seen, for example, that for an estimated proportion of 0.30, and a sample of size of 200, half the width of the confidence interval is 0.08, so that the 95 percent confidence interval for the estimated proportion would be from 0.22 to 0.38. If the sample size had been 3200, instead of 200, the 95 percent confidence interval would be from 0.28 to 0.32.

    The actual design effect for individual variables will vary, depending on how values of that variable are distributed among the clusters of the sample. These can be calculated using advanced statistical software for survey analysis.

  9. r

    Data from: Transformation of measurement uncertainties into low-dimensional...

    • resodate.org
    • data.niaid.nih.gov
    • +1more
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonios Alexiadis; Scott Ferson; Eann A. Patterson (2021). Data from: Transformation of measurement uncertainties into low-dimensional feature vector space [Dataset]. http://doi.org/10.5061/DRYAD.6HDR7SQX2
    Explore at:
    Dataset updated
    Jan 1, 2021
    Dataset provided by
    Dryad
    Authors
    Antonios Alexiadis; Scott Ferson; Eann A. Patterson
    Description

    Advances in technology allow the acquisition of data with high spatial and temporal resolution. These datasets are usually accompanied by estimates of the measurement uncertainty, which may be spatially or temporally varying and should be taken into consideration when making decisions based on the data. At the same time, various transformations are commonly implemented to reduce the dimensionality of the datasets for post-processing, or to extract significant features. However, the corresponding uncertainty is not usually represented in the low-dimensional or feature vector space. A method is proposed that maps the measurement uncertainty into the equivalent low-dimensional space with the aid of approximate Bayesian computation, resulting in a distribution that can be used to make statistical inferences. The method involves no assumptions about the probability distribution of the measurement error and is independent of the feature extraction process as demonstrated in three examples. In the first two examples Chebyshev polynomials were used to analyse structural displacements and soil moisture measurements; while in the third, principal component analysis was used to decompose global ocean temperature data. The uses of the method range from supporting decision making in model validation or confirmation, model updating or calibration and tracking changes in condition, such as the characterisation of the El Niño Southern Oscillation.

  10. r

    The banksia plot: a method for visually comparing point estimates and...

    • researchdata.edu.au
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.V2
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:

    Background:

    In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.

    Methods:

    The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.

    Results:

    In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.

    Conclusions:

    The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.

    This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  11. S

    Simulated data for Confidence Interval Width Contours: Sample Size Planning...

    • scidb.cn
    Updated Jun 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liu Yue; Xu Lei; Liu Hongyun (2022). Simulated data for Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models [Dataset]. http://doi.org/10.57760/sciencedb.01813
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2022
    Dataset provided by
    Science Data Bank
    Authors
    Liu Yue; Xu Lei; Liu Hongyun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data set is the simulation data for sample size planning based on the power analysis and accuracy in parameter estimation for linear mixed-effects models.

  12. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSST5Y2014.S1702
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2010-2014 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates

  13. 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2021
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  14. Condition Data with Random Recording Time

    • kaggle.com
    zip
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prognostics @ HSE (2022). Condition Data with Random Recording Time [Dataset]. https://www.kaggle.com/datasets/prognosticshse/condition-data-with-random-recording-time/data
    Explore at:
    zip(1167682 bytes)Available download formats
    Dataset updated
    Jun 10, 2022
    Authors
    Prognostics @ HSE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.

    Detailed specification: For more information about the general operation and the components used, see the provided description file Random Recording Condition Data Data Set.pdf

    Given data situation: In order to implement a predictive maintenance policy, knowledge about the time of failure respectively about the remaining useful life (RUL) of the technical system is necessary. The time of failure or the RUL can be predicted on the basis of condition data that indicate the damage progression of a technical system over time. However, the collection of condition data in typical industrial PHM applications is often only possible in an incomplete manner. An example is the collection of data during defined test cycles with specific loads, carried at intervals. For instance, this approach is often used with machining centers, where test cycles are only carried out between finished machining jobs or work shifts. Due to different work pieces, the machining time varies and the test cycle with the recording of condition data is not performed equidistantly. This results in a data characteristic that is comparable to a random sample of continuously recorded condition data. Another example that may result in such a data characteristic comes from the effort to reduce data volumes when recording condition data. Attempts can be made to keep the amount of data with unchanged damage as small as possible. One possible measure is not to transmit and store the continuous sensor readings, but rather sections of them, which also leads to gaps in the data available for prognosis. In the present data set, the life cycle of filters or rather their condition data, represented by the differential pressure, is considered. Failure of the filter occurs when the differential pressure across the filter exceeds 600 Pa. The time until a filter failure occurs depends especially on the amount of dust supplied per time, which is constant within a run-to-failure cycle. The previously explained data characteristics are addressed by means of corresponding training and test data. The training data is structured as follows: A run-to-failure cycle contains n batches of data. The number n varies between the cycles and depends on the duration of the batches and the time interval between the individual batches. The duration and time interval of the batches are random variables. A data batch includes the sensor readings of differential pressure and flow rate for the filter, the start and end time of the batch, and RUL information related to the end time of the batch. The sensor readings of the differential pressure and flow rate are recorded at a constant sampling rate. Figure 6 shows an illustrative run-to-failure cycle with multiple batches. The test data are randomly right-censored. They are also made of batches with a random duration and time interval between the batches. For each batch contained, the start and end time are given, as well as the sensor readings within the batch. The RUL is not given for each batch but only for the last data point of the right-censored run-to-failure cycle.

    Task: The aim is to predict the RUL of the censored filter test cycles given in the test data. In order to predict the RUL, training and test data are given, each consisting of 60 and 40 run-to-failure cycles. The test data contains random right-censored run-to-failure cycles and the respective RUL for the prediction task. The main challenge is to make the best use of the incompletely recorded training and test data to provide the most accurate prediction possible. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machi...

  15. d

    Data from: New Source Rock Data for the Niobrara and Sage Breaks intervals...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). New Source Rock Data for the Niobrara and Sage Breaks intervals of the lower Cody Shale in the Wyoming part of the Bighorn Basin [Dataset]. https://catalog.data.gov/dataset/new-source-rock-data-for-the-niobrara-and-sage-breaks-intervals-of-the-lower-cody-shale-in
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Wyoming, Bighorn Basin
    Description

    In 2019 the U.S. Geological Survey (USGS) quantitively assessed the potential for undiscovered, technically recoverable continuous (unconventional) oil and gas resources in the Niobrara interval of the Cody Shale in the Bighorn Basin Province (Finn and others, 2019). Leading up to the assessment, in 2017, the USGS collected samples from the Niobrara and underlying Sage Breaks intervals (Finn, 2019) to better characterize the source rock potential of the Niobrara interval. Eighty-two samples from 31 wells were collected from the well cuttings collection stored at the USGS Core Research Center in Lakewood, Colorado. The selected wells are located near the outcrop belt along the shallow margins of the basin to obtain samples that were not subjected to the effects of deep burial and subsequent organic carbon loss due to thermal maturation as described by Daly and Edman (1987) (fig. 1). Sixty samples are from the Niobrara interval, and 22 from the Sage Breaks interval (fig. 2).

  16. Traffic Signal Change and Clearance Interval Pooled Fund Study Utah BSM...

    • data.virginia.gov
    • catalog.data.gov
    csv, json, xml, xsl
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S Department of Transportation (2025). Traffic Signal Change and Clearance Interval Pooled Fund Study Utah BSM Trajectories Sample [Dataset]. https://data.virginia.gov/dataset/traffic-signal-change-and-clearance-interval-pooled-fund-study-utah-bsm-trajectories-sample
    Explore at:
    csv, xsl, xml, jsonAvailable download formats
    Dataset updated
    Nov 3, 2025
    Dataset provided by
    Federal Highway Administrationhttps://highways.dot.gov/
    Authors
    U.S Department of Transportation
    Area covered
    Utah
    Description

    This dataset contains timestamped Basic Safety Messages (BSMs) collected from connected vehicles operating in Utah from vendor PANASONIC as part of the ITS JPO's Traffic Signal Change and Clearance Interval Pooled Fund Study. The data includes GPS location, speed, heading, accelerations, and brake status at 10 Hz frequency. These BSMs were transmitted from vehicles equipped with aftermarket onboard units (OBUs) and have been anonymized. The dataset supports research related to vehicle kinematics during signal change intervals and interactions with traffic signal states. To request the full dataset please email data.itsjpo@dot.gov.

  17. Wind Generation Time Interval Exploration Tool

    • catalog.data.gov
    • data.cnra.ca.gov
    • +2more
    Updated Jul 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Energy Commission (2025). Wind Generation Time Interval Exploration Tool [Dataset]. https://catalog.data.gov/dataset/wind-generation-time-interval-exploration-tool-28d5a
    Explore at:
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    California Energy Commissionhttp://www.energy.ca.gov/
    Description

    This Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period. The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan. By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.

  18. Data from: (Table 1) Number of counted pollen samples, intervals and time...

    • doi.pangaea.de
    • search.dataone.org
    html, tsv
    Updated 2003
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jun Tian; Xiangjun Sun; Yunli Luo; Fei Huang; Pinxian Wang (2003). (Table 1) Number of counted pollen samples, intervals and time resolution between samples of ODP SIte 184-1144 [Dataset]. http://doi.org/10.1594/PANGAEA.738717
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    2003
    Dataset provided by
    PANGAEA
    Authors
    Jun Tian; Xiangjun Sun; Yunli Luo; Fei Huang; Pinxian Wang
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Mar 13, 1999 - Mar 18, 1999
    Area covered
    Variables measured
    Sample amount, Depth, top/min, Sample spacing, Time resolution, Age, maximum/old, Depth, bottom/max, Age, minimum/young, DEPTH, sediment/rock
    Description

    This dataset is about: (Table 1) Number of counted pollen samples, intervals and time resolution between samples of ODP SIte 184-1144. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.738719 for more information.

  19. Z

    Simulation Data & R scripts for: "Introducing recurrent events analyses to...

    • data.niaid.nih.gov
    • doi.org
    • +1more
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
    Explore at:
    Dataset updated
    Apr 29, 2024
    Dataset provided by
    Department of National Park Monitoring and Animal Management, Bavarian Forest National Park
    Authors
    Ferry, Nicolas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files descriptions:

    All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

    ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    "results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.

  20. Data from: Melded Confidence Intervals Do Not Provide Guaranteed Coverage

    • tandf.figshare.com
    txt
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesse Frey; Yimin Zhang (2024). Melded Confidence Intervals Do Not Provide Guaranteed Coverage [Dataset]. http://doi.org/10.6084/m9.figshare.24112584.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jesse Frey; Yimin Zhang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Melded confidence intervals were proposed as a way to combine two independent one-sample confidence intervals to obtain a two-sample confidence interval for a quantity like a difference or a ratio. Simulation-based work has suggested that melded confidence intervals always provide at least the nominal coverage. However, we show here that for the case of melded confidence intervals for a difference in population quantiles, the confidence intervals do not guarantee the nominal coverage. We derive a lower bound on the coverage for a one-sided confidence interval, and we show that there are pairs of distributions that make the coverage arbitrarily close to this lower bound. One specific example of our results is that the 95% melded upper bound on the difference between two population medians offers a guaranteed coverage of only 88.3% when both samples are of size 20.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Emily Rollinson (2016). Confidence Interval Examples [Dataset]. http://doi.org/10.6084/m9.figshare.3466364.v2
Organization logoOrganization logo

Confidence Interval Examples

Explore at:
62 scholarly articles cite this dataset (View in Google Scholar)
application/cdfv2Available download formats
Dataset updated
Jun 28, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Emily Rollinson
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.

Search
Clear search
Close search
Google apps
Main menu