Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.
Facebook
Twitter
According to our latest research, the global Interval Data Analytics market size reached USD 3.42 billion in 2024, demonstrating robust growth across key verticals. The market is expected to advance at a CAGR of 13.8% from 2025 to 2033, leading to a projected value of USD 10.13 billion by 2033. This impressive expansion is primarily driven by rising demand for advanced analytics solutions capable of processing time-stamped and interval-based data, especially as organizations seek to optimize operations, enhance predictive capabilities, and comply with evolving regulatory requirements.
One of the most significant growth factors propelling the Interval Data Analytics market is the exponential increase in data generation from IoT devices, smart meters, and connected infrastructure across industries. Organizations in sectors such as utilities, manufacturing, and healthcare are increasingly reliant on interval data for resource optimization, real-time monitoring, and predictive maintenance. The ability of interval data analytics to handle vast amounts of granular, time-series data enables businesses to uncover actionable insights, reduce operational costs, and improve asset utilization. Additionally, the growing adoption of smart grids and intelligent energy management systems further amplifies the need for sophisticated interval data analytics solutions that support real-time decision-making and regulatory compliance.
Another pivotal driver for the Interval Data Analytics market is the rapid digital transformation and integration of artificial intelligence (AI) and machine learning (ML) technologies into analytics platforms. These advancements allow for more accurate forecasting, anomaly detection, and automated response mechanisms, which are critical in sectors like finance, healthcare, and telecommunications. As organizations continue to prioritize data-driven strategies, the demand for interval data analytics tools that can seamlessly integrate with existing IT ecosystems and provide scalable, cloud-based solutions is accelerating. Furthermore, the shift towards cloud computing and the proliferation of big data platforms are making it easier for enterprises of all sizes to deploy and scale interval data analytics capabilities, thus broadening the market's reach and potential.
Regulatory pressures and the increasing need for transparency and accountability in data handling are also fueling the growth of the Interval Data Analytics market. Industries such as banking and financial services, healthcare, and energy are subject to stringent compliance requirements that necessitate precise monitoring and reporting of interval data. The ability of interval data analytics platforms to provide auditable, time-stamped records and support regulatory reporting is becoming a critical differentiator for vendors in this space. Moreover, as data privacy laws evolve and enforcement intensifies, organizations are investing in analytics solutions that offer robust security features, data lineage tracking, and comprehensive audit trails, further boosting market adoption.
From a regional perspective, North America continues to lead the Interval Data Analytics market, driven by early technology adoption, a strong presence of leading analytics vendors, and substantial investments in digital infrastructure. However, the Asia Pacific region is rapidly emerging as a key growth engine, fueled by large-scale digitalization initiatives, expanding industrial automation, and increasing penetration of IoT devices. Europe also represents a significant market, underpinned by regulatory mandates and a mature industrial base. Latin America and the Middle East & Africa, while currently smaller in market share, are witnessing accelerated adoption as organizations in these regions recognize the value of interval data analytics in enhancing operational efficiency and competitiveness.
The Interval Data Analytics market is segmented by component into software and services, each playing a distinct role in
Facebook
Twitter
According to our latest research, the global Interval Data Validation and Estimation Tools market size reached USD 1.46 billion in 2024. With a robust compound annual growth rate (CAGR) of 11.2% projected over the forecast period, the market is expected to reach USD 3.73 billion by 2033. This growth is primarily driven by the rising demand for advanced data quality assurance and analytics solutions across sectors such as BFSI, healthcare, manufacturing, and IT & telecommunications. As organizations increasingly rely on accurate interval data for operational efficiency and regulatory compliance, the adoption of validation and estimation tools continues to surge.
A key factor propelling the growth of the Interval Data Validation and Estimation Tools market is the exponential rise in data generation from connected devices, IoT sensors, and digital platforms. Businesses today are inundated with massive volumes of interval data, which, if not validated and accurately estimated, can lead to significant operational inefficiencies and decision-making errors. These tools play a crucial role in ensuring the integrity, accuracy, and completeness of interval data, thereby enabling organizations to derive actionable insights and maintain competitive advantage. Furthermore, the growing emphasis on automation and digital transformation initiatives is pushing enterprises to invest in sophisticated data validation and estimation solutions, further accelerating market growth.
Another major growth driver is the increasing stringency of regulatory requirements across industries, particularly in sectors such as BFSI, healthcare, and utilities. Regulations related to data governance, privacy, and reporting demand organizations to maintain high standards of data quality and compliance. Interval Data Validation and Estimation Tools help organizations adhere to these regulatory mandates by providing automated checks, anomaly detection, and robust audit trails. The integration of artificial intelligence and machine learning into these tools is further enhancing their capabilities, enabling real-time data validation and predictive estimation, which is critical in fast-paced business environments.
Additionally, the surge in cloud adoption and the proliferation of cloud-based data management platforms are significantly contributing to the market’s expansion. Cloud-based deployment models offer scalability, flexibility, and cost-efficiency, making advanced validation and estimation tools accessible to small and medium-sized enterprises as well as large organizations. The ability to seamlessly integrate with existing data architectures and third-party applications is also a key factor driving the adoption of both on-premises and cloud-based solutions. As data ecosystems become increasingly complex and distributed, the demand for interval data validation and estimation tools is expected to witness sustained growth through 2033.
From a regional perspective, North America currently holds the largest share of the Interval Data Validation and Estimation Tools market, driven by early technology adoption, a strong focus on data-driven decision-making, and a mature regulatory landscape. However, Asia Pacific is anticipated to register the fastest CAGR of 13.5% during the forecast period, fueled by rapid digitalization, expanding industrialization, and increasing investments in smart infrastructure. Europe and Latin America are also witnessing steady growth, supported by government initiatives and the rising importance of data quality management in emerging economies. The Middle East & Africa region, though comparatively nascent, is expected to demonstrate significant potential as digital transformation initiatives gain momentum.
The Interval Data Validation and Estimation Tools market by component is broadly segmented into Software and Servic
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set behind the Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period.
Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan.
By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interval data are widely used in many fields, notably in economics, industry, and health areas. Analogous to the scatterplot for single-value data, the rectangle plot and cross plot are the conventional visualization methods for the relationship between two variables in interval forms. These methods do not provide much information to assess complicated relationships, however. In this article, we propose two visualization methods: Segment and Dandelion plots. They offer much more information than the existing visualization methods and allow us to have a much better understanding of the relationship between two variables in interval forms. A general guide for reading these plots is provided. Relevant theoretical support is developed. Both empirical and real data examples are provided to demonstrate the advantages of the proposed visualization methods. Supplementary materials for this article are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set is the simulation data for sample size planning based on the power analysis and accuracy in parameter estimation for linear mixed-effects models.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
IBM Intraday Stock Price Data (5-Minute Intervals) This dataset provides comprehensive intraday trading data for IBM stock at 5-minute intervals, capturing essential price and volume metrics for each trading session. It is ideal for short-term trading analysis, pattern recognition, and intraday trend forecasting.
Dataset Overview Each row in the dataset represents IBM's stock information for a specific 5-minute interval, including:
Timestamp: The exact time (Eastern Time) for each data entry. Open: The stock price at the beginning of the interval. High: The highest price within the interval. Low: The lowest price within the interval. Close: The stock price at the end of the interval. Volume: The number of shares traded within the interval. Potential Uses This dataset is well-suited for various financial and quantitative analysis projects, such as:
Volume and Price Movement Analysis: Identify periods with unusually high trading volume and investigate if they correspond with significant price changes or market events. Intraday Trend Analysis: Observe trends by plotting the closing prices over time to spot patterns in stock performance during a single trading day or across multiple days. Volatility Detection: Track intervals with a high difference between the high and low prices to detect periods of increased price volatility. Time-Series Forecasting: Use machine learning models to predict price movements based on historical intraday data and patterns. Example Analysis Ideas Visualize Price Movements: Plot open, high, low, and close prices over time to get a clear view of price trends and fluctuations. Analyze Volume Spikes: Find and investigate timestamps with high trading volume, which might indicate significant market activity. Apply Machine Learning: Use techniques such as LSTM, ARIMA, or other time-series forecasting models to predict short-term price movements. This dataset is especially valuable for traders, quantitative analysts, and developers building financial models or applications that require real-time market insights.
About the Data This dataset was obtained via the Alpha Vantage API, using their TIME_SERIES_INTRADAY function. The data here represents IBM's intraday stock price movements on November 11, 2024, at 5-minute intervals.
Facebook
TwitterThis dataset contains timestamped Basic Safety Messages (BSMs) collected from connected vehicles operating in Utah from vendor PANASONIC as part of the ITS JPO's Traffic Signal Change and Clearance Interval Pooled Fund Study. The data includes GPS location, speed, heading, accelerations, and brake status at 10 Hz frequency. These BSMs were transmitted from vehicles equipped with aftermarket onboard units (OBUs) and have been anonymized. The dataset supports research related to vehicle kinematics during signal change intervals and interactions with traffic signal states. To request the full dataset please email data.itsjpo@dot.gov.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.
Detailed specification: For more information about the general operation and the components used, see the provided description file Random Recording Condition Data Data Set.pdf
Given data situation: In order to implement a predictive maintenance policy, knowledge about the time of failure respectively about the remaining useful life (RUL) of the technical system is necessary. The time of failure or the RUL can be predicted on the basis of condition data that indicate the damage progression of a technical system over time. However, the collection of condition data in typical industrial PHM applications is often only possible in an incomplete manner. An example is the collection of data during defined test cycles with specific loads, carried at intervals. For instance, this approach is often used with machining centers, where test cycles are only carried out between finished machining jobs or work shifts. Due to different work pieces, the machining time varies and the test cycle with the recording of condition data is not performed equidistantly. This results in a data characteristic that is comparable to a random sample of continuously recorded condition data. Another example that may result in such a data characteristic comes from the effort to reduce data volumes when recording condition data. Attempts can be made to keep the amount of data with unchanged damage as small as possible. One possible measure is not to transmit and store the continuous sensor readings, but rather sections of them, which also leads to gaps in the data available for prognosis. In the present data set, the life cycle of filters or rather their condition data, represented by the differential pressure, is considered. Failure of the filter occurs when the differential pressure across the filter exceeds 600 Pa. The time until a filter failure occurs depends especially on the amount of dust supplied per time, which is constant within a run-to-failure cycle. The previously explained data characteristics are addressed by means of corresponding training and test data. The training data is structured as follows: A run-to-failure cycle contains n batches of data. The number n varies between the cycles and depends on the duration of the batches and the time interval between the individual batches. The duration and time interval of the batches are random variables. A data batch includes the sensor readings of differential pressure and flow rate for the filter, the start and end time of the batch, and RUL information related to the end time of the batch. The sensor readings of the differential pressure and flow rate are recorded at a constant sampling rate. Figure 6 shows an illustrative run-to-failure cycle with multiple batches. The test data are randomly right-censored. They are also made of batches with a random duration and time interval between the batches. For each batch contained, the start and end time are given, as well as the sensor readings within the batch. The RUL is not given for each batch but only for the last data point of the right-censored run-to-failure cycle.
Task: The aim is to predict the RUL of the censored filter test cycles given in the test data. In order to predict the RUL, training and test data are given, each consisting of 60 and 40 run-to-failure cycles. The test data contains random right-censored run-to-failure cycles and the respective RUL for the prediction task. The main challenge is to make the best use of the incompletely recorded training and test data to provide the most accurate prediction possible. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machi...
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Advances in technology allow the acquisition of data with high spatial and temporal resolution. These datasets are usually accompanied by estimates of the measurement uncertainty, which may be spatially or temporally varying and should be taken into consideration when making decisions based on the data. At the same time, various transformations are commonly implemented to reduce the dimensionality of the datasets for post-processing, or to extract significant features. However, the corresponding uncertainty is not usually represented in the low-dimensional or feature vector space. A method is proposed that maps the measurement uncertainty into the equivalent low-dimensional space with the aid of approximate Bayesian computation, resulting in a distribution that can be used to make statistical inferences. The method involves no assumptions about the probability distribution of the measurement error and is independent of the feature extraction process as demonstrated in three examples. In the first two examples Chebyshev polynomials were used to analyse structural displacements and soil moisture measurements; while in the third, principal component analysis was used to decompose global ocean temperature data. The uses of the method range from supporting decision making in model validation or confirmation, model updating or calibration and tracking changes in condition, such as the characterisation of the El Niño Southern Oscillation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files descriptions:
All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.
ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2013 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Public assistance includes receipt of Supplemental Security Income (SSI), cash public assistance income, or Food Stamps..The Census Bureau introduced a new set of disability questions in the 2008 ACS questionnaire. Accordingly, comparisons of disability data from 2008 or later with data from prior years are not recommended. For more information on these questions and their evaluation in the 2006 ACS Content Test, see the Evaluation Report Covering Disability..Excludes householders, spouses, and unmarried partners..Foreign born excludes people born outside the United States to a parent who is a U.S. citizen..In data year 2013, there were a series of changes to data collection operations that could have affected some estimates. These changes include the addition of Internet as a mode of data collection, the end of the content portion of Failed Edit Follow-Up interviewing, and the loss of one monthly panel due to the Federal Government shut down in October 2013. For more information, see: User Notes.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2013 American Community Survey
Facebook
TwitterIn 2019 the U.S. Geological Survey (USGS) quantitively assessed the potential for undiscovered, technically recoverable continuous (unconventional) oil and gas resources in the Niobrara interval of the Cody Shale in the Bighorn Basin Province (Finn and others, 2019). Leading up to the assessment, in 2017, the USGS collected samples from the Niobrara and underlying Sage Breaks intervals (Finn, 2019) to better characterize the source rock potential of the Niobrara interval. Eighty-two samples from 31 wells were collected from the well cuttings collection stored at the USGS Core Research Center in Lakewood, Colorado. The selected wells are located near the outcrop belt along the shallow margins of the basin to obtain samples that were not subjected to the effects of deep burial and subsequent organic carbon loss due to thermal maturation as described by Daly and Edman (1987) (fig. 1). Sixty samples are from the Niobrara interval, and 22 from the Sage Breaks interval (fig. 2).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterThis Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period. The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan. By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
Facebook
TwitterBelize is one of the countries in Latin America that was not included in the World Fertility Survey, the Contraceptive Prevalence Survey project, or the Demographic and Health Survey program during the 1970's and 1980's. As a result, data on contraceptive prevalence and the use of maternal and child health services in Belize has been limited. The 1991 Family Health Survey was designed to provide health professionals and international donors with data to assess infant and child mortality, fertility, and the use of family planning and health services in Belize.
The objectives of the 1991 Family Health Survey were to: - obtain national fertility estimates; - estimate levels of infant and child mortality; - estimate the percentage of mothers who breastfed their last child and duration of breastfeeding; - determine levels of knowledge and current use of contraceptives for a variety of social and demographic background variables and to determine the source where users obtain the methods they use; - determine reasons for nonuse of contraception and estimate the percentage of women who are at risk of an unplanned pregnancy and, thus, in need of family planning services; and - examine the use of maternal and child health services and immunization levels for children less than 5 years of age and to examine the prevalence and treatment of diarrhea and acute respiratory infections among these children.
National
Sample survey data [ssd]
The 1991 Belize Family Health Survey was an area probability survey with two stages of selection. The sampling frame for the survey was the quick count of all households in the country conducted in 1990 by the Central Statistical Office in preparation for the 1991 census. Two strata, or domains, were sampled independently: urban areas and rural areas. In the first stage of selection for the urban domain, a systematic sample with a random start was used to select enumeration districts in the domain with probability of selection proportional to the number of households in each district. In the second stage of selection, households were chosen systematically using a constant sampling interval (4.2350) across all of the selected enumeration districts. The enumeration districts selected for the rural domain were the same as those that had been selected earlier for the 1990 Belize Household Expenditure Survey. The second stage selection of rural households was conducted the same way it was for the urban domain but used a constant sampling interval of 2.1363. In order to have a self-weighting geographic sample, 3,106 urban households and 1,871 rural households were selected for a total of 4,977 households.
Only one woman aged 15-44 per household was selected for interview. Each respondent's probability of selection was inversely proportional to the number of eligible women in the household. Thus, weighting factors were applied to compensate for this unequal probability of selection. In the tables presented in this report, proportions and means are based on the weighted number of cases, but the unweighted numbers are shown.
Face-to-face [f2f]
Of the 4,977 households selected, 4,566 households were visited. Overall, 8 percent of households could not be located, and 7 percent of the households were found to be vacant. Less than 3 percent of the households refused to be interviewed. Fifty-five percent of sample households includeed at least one woman aged 15-44. Complete interviews were obtained in 94 percent of the households that had an eligible respondent, for a total of 2,656 interviews. Interview completion rates did not vary by residence.
The estimates for a sample survey are affected by two types of errors: (1) sampling error and (2) non-sampling error. Non-sampling error is the result of mistakes made in carrying out data collection and data processing, including the failure to locate and interview the right household, errors in the way questions are asked or understood, and data entry errors. Although quality control efforts were made during the implementation of the Family Health Survey to minimize this type of error, non-sampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling error is defined as the difference between the true value for any variable measured in a survey and the value estimated by the survey. Sampling error is a measure of the variability between all possible samples that could have been selected from the same population using the same sample design and size. For the entire population and for large subgroups, the Family Health Survey is large enough that the sampling error for most estimates is small. However, for small subgroups, sampling errors are larger and may affect the reliability of the estimates. Sampling error is usually measured in terms of the standard error for a particular statistic (mean, proportion, or ratio), which is the square root of the variance. The standard error can be used to calculate confidence intervals for estimated statistics. For example, the 95 percent confidence interval for a statistic is the estimated value plus or minus 1.96 times the standard error for the estimate.
The standard errors of statistics estimated using a multistage cluster sample design, such as that used in the Family Health Survey, are more complex than are standard errors based on simple random samples, and they tend to be somewhat larger than the standard errors produced by a simple random sample. The increase in standard error due to using a multi-stage cluster design is referred to as the design effect, which is defined as the ratio between the variance for the estimate using the sample design that was used and the variance for the estimate that would result if a simple random sample had been used. Based on experience with similar surveys, the design effect generally falls in a range from 1.2 to 2.0 for most variables.
Table E.1 of the Final Report presents examples of what the 95 percent confidence interval on an estimated proportion would be, under a variety of sample sizes, assuming a design effect of 1.6. It presents half-widths of the 95 percent confidence intervals corresponding to sample sizes, ranging from 25 to 3200 cases, and corresponding to estimated proportions ranging from .05/.95 to .50/.50. The formula used for calculating the half-width of the 95 percent confidence interval is:
(half of 95% C.I.) = (1.96) SQRT {(1.6)(p)(1-p) / n},
where p is the estimated proportion, n is the number of cases used in calculating the proportion, and 1.6 is the design effect. It can be seen, for example, that for an estimated proportion of 0.30, and a sample of size of 200, half the width of the confidence interval is 0.08, so that the 95 percent confidence interval for the estimated proportion would be from 0.22 to 0.38. If the sample size had been 3200, instead of 200, the 95 percent confidence interval would be from 0.28 to 0.32.
The actual design effect for individual variables will vary, depending on how values of that variable are distributed among the clusters of the sample. These can be calculated using advanced statistical software for survey analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.