Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interval data are widely used in many fields, notably in economics, industry, and health areas. Analogous to the scatterplot for single-value data, the rectangle plot and cross plot are the conventional visualization methods for the relationship between two variables in interval forms. These methods do not provide much information to assess complicated relationships, however. In this article, we propose two visualization methods: Segment and Dandelion plots. They offer much more information than the existing visualization methods and allow us to have a much better understanding of the relationship between two variables in interval forms. A general guide for reading these plots is provided. Relevant theoretical support is developed. Both empirical and real data examples are provided to demonstrate the advantages of the proposed visualization methods. Supplementary materials for this article are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data set behind the Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period.
Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan.
By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Advances in technology allow the acquisition of data with high spatial and temporal resolution. These datasets are usually accompanied by estimates of the measurement uncertainty, which may be spatially or temporally varying and should be taken into consideration when making decisions based on the data. At the same time, various transformations are commonly implemented to reduce the dimensionality of the datasets for post-processing, or to extract significant features. However, the corresponding uncertainty is not usually represented in the low-dimensional or feature vector space. A method is proposed that maps the measurement uncertainty into the equivalent low-dimensional space with the aid of approximate Bayesian computation, resulting in a distribution that can be used to make statistical inferences. The method involves no assumptions about the probability distribution of the measurement error and is independent of the feature extraction process as demonstrated in three examples. In the first two examples Chebyshev polynomials were used to analyse structural displacements and soil moisture measurements; while in the third, principal component analysis was used to decompose global ocean temperature data. The uses of the method range from supporting decision making in model validation or confirmation, model updating or calibration and tracking changes in condition, such as the characterisation of the El Niño Southern Oscillation.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Appendix B of GEUS report 2024/10. Data included are raw HH-XRF measurments of cuttings nad conversion table to ICP-OES/MS data
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.
Detailed specification: For more information about the general operation and the components used, see the provided description file Random Recording Condition Data Data Set.pdf
Given data situation: In order to implement a predictive maintenance policy, knowledge about the time of failure respectively about the remaining useful life (RUL) of the technical system is necessary. The time of failure or the RUL can be predicted on the basis of condition data that indicate the damage progression of a technical system over time. However, the collection of condition data in typical industrial PHM applications is often only possible in an incomplete manner. An example is the collection of data during defined test cycles with specific loads, carried at intervals. For instance, this approach is often used with machining centers, where test cycles are only carried out between finished machining jobs or work shifts. Due to different work pieces, the machining time varies and the test cycle with the recording of condition data is not performed equidistantly. This results in a data characteristic that is comparable to a random sample of continuously recorded condition data. Another example that may result in such a data characteristic comes from the effort to reduce data volumes when recording condition data. Attempts can be made to keep the amount of data with unchanged damage as small as possible. One possible measure is not to transmit and store the continuous sensor readings, but rather sections of them, which also leads to gaps in the data available for prognosis. In the present data set, the life cycle of filters or rather their condition data, represented by the differential pressure, is considered. Failure of the filter occurs when the differential pressure across the filter exceeds 600 Pa. The time until a filter failure occurs depends especially on the amount of dust supplied per time, which is constant within a run-to-failure cycle. The previously explained data characteristics are addressed by means of corresponding training and test data. The training data is structured as follows: A run-to-failure cycle contains n batches of data. The number n varies between the cycles and depends on the duration of the batches and the time interval between the individual batches. The duration and time interval of the batches are random variables. A data batch includes the sensor readings of differential pressure and flow rate for the filter, the start and end time of the batch, and RUL information related to the end time of the batch. The sensor readings of the differential pressure and flow rate are recorded at a constant sampling rate. Figure 6 shows an illustrative run-to-failure cycle with multiple batches. The test data are randomly right-censored. They are also made of batches with a random duration and time interval between the batches. For each batch contained, the start and end time are given, as well as the sensor readings within the batch. The RUL is not given for each batch but only for the last data point of the right-censored run-to-failure cycle.
Task: The aim is to predict the RUL of the censored filter test cycles given in the test data. In order to predict the RUL, training and test data are given, each consisting of 60 and 40 run-to-failure cycles. The test data contains random right-censored run-to-failure cycles and the respective RUL for the prediction task. The main challenge is to make the best use of the incompletely recorded training and test data to provide the most accurate prediction possible. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machi...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
IBM Intraday Stock Price Data (5-Minute Intervals) This dataset provides comprehensive intraday trading data for IBM stock at 5-minute intervals, capturing essential price and volume metrics for each trading session. It is ideal for short-term trading analysis, pattern recognition, and intraday trend forecasting.
Dataset Overview Each row in the dataset represents IBM's stock information for a specific 5-minute interval, including:
Timestamp: The exact time (Eastern Time) for each data entry. Open: The stock price at the beginning of the interval. High: The highest price within the interval. Low: The lowest price within the interval. Close: The stock price at the end of the interval. Volume: The number of shares traded within the interval. Potential Uses This dataset is well-suited for various financial and quantitative analysis projects, such as:
Volume and Price Movement Analysis: Identify periods with unusually high trading volume and investigate if they correspond with significant price changes or market events. Intraday Trend Analysis: Observe trends by plotting the closing prices over time to spot patterns in stock performance during a single trading day or across multiple days. Volatility Detection: Track intervals with a high difference between the high and low prices to detect periods of increased price volatility. Time-Series Forecasting: Use machine learning models to predict price movements based on historical intraday data and patterns. Example Analysis Ideas Visualize Price Movements: Plot open, high, low, and close prices over time to get a clear view of price trends and fluctuations. Analyze Volume Spikes: Find and investigate timestamps with high trading volume, which might indicate significant market activity. Apply Machine Learning: Use techniques such as LSTM, ARIMA, or other time-series forecasting models to predict short-term price movements. This dataset is especially valuable for traders, quantitative analysts, and developers building financial models or applications that require real-time market insights.
About the Data This dataset was obtained via the Alpha Vantage API, using their TIME_SERIES_INTRADAY function. The data here represents IBM's intraday stock price movements on November 11, 2024, at 5-minute intervals.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.
The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.
In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.
The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.
This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
∙ example-1-JM.R: Code to fit M1 ∙ longitudinal-data.csv: simulated TAC data for M1 ∙ survival-data.csv: simulated dnDSA data for M1 ∙ model-1-JM.txt: JAGS model, called by example-1-JM.R ∙ example-1-NLMIXED.SAS: Code to fit M1 with PROC NLMIXED, uses same simulated data ∙ example-4-TVC.R: Code to fit M4 ∙ longitudinal data tvc.csv: simulated TAC data for M4 (carried forward values of TAC) ∙ model-4-TVC.txt: JAGS model, called by example-4-TVC.R (ZIP 199 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Files descriptions:
All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.
ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).
Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
Facebook
TwitterThis Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period. The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan. By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
Facebook
TwitterAggregate means for six traits (milk, fat, and protein yields, somatic cell score, length of productive life, and daughter pregnancy rate) Resources in this dataset:Resource Title: Holstein Milk Yield. File Name: HO_M.csvResource Description: Aggregate means of Holstein predicted breeding values for milk yield and birth datesResource Title: Holstein Fat Yield. File Name: HO_f.csvResource Description: Aggregate means of Holstein predicted breeding values for fat yield and birth datesResource Title: Holstein Protein Yield. File Name: HO_p.csvResource Description: Aggregate means of Holstein predicted breeding values for protein yield and birth datesResource Title: Holstein Somatic Cell Score. File Name: HO_scs.csvResource Description: Aggregate means of Holstein predicted breeding values for somatic cell score and birth datesResource Title: Holstein Productive Life. File Name: HO_pl.csvResource Description: Aggregate means of Holstein predicted breeding values for productive life and birth datesResource Title: Holstein Daughter Pregnancy Rate. File Name: HO_DPR.csvResource Description: Aggregate means of Holstein predicted breeding values for daughter pregnancy rate and birth datesResource Title: Data Dictionary. File Name: data_dictionary.csvResource Description: Defines variables / sub-components with examples as used in column headers. Filenames: Holstein Productive Life: HO_pl.csv Holstein Daughter Pregnancy Rate: HO_DPR.csv Holstein Somatic Cell Score: HO_scs.csv Holstein Protein Yield: HO_p.csv Holstein Fat Yield: HO_f.csv Holstein Milk Yield: HO_M.csv
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.
Detailed specification: For more information about the general operation and the components used, see the provided description file Preventive to Predicitve Maintenance dataset.pdf
Given data situation: The data set Preventive to Predicitve Maintenance is about the transition of a preventive maintenance strategy to a predictive maintenance strategy of a replaceable part, in this case a filter. To aid the realisation of predictive maintenance, life cycles have already been recorded from the application studied. However, the preventive maintenance in place so far causes them to be replaced after a fixed period of time, regardless of the condition of the degrading part. As a result, the end of life is not known for most records and thus they are right-censored. The so given training data are recorded runs of the filter up to a periodic replacement interval. When specifying the interval length for preventive maintenance, a trade-off has to be made between wasted life and the frequency of unplanned downtimes that occur, when having a particularly short life. The interval here is chosen so that, on average, failure is observed at the shortest 10% of the filter lives in the training data. The other lives are censored. The filter failure occurs when the differential pressure across the filter exceeds 600 Pa. The maintenance interval length depends on the amount of dust fed in per time, which is constant within a test run. For example, at twice the dust feed, the maintenance interval is half as long. The same relationship therefore applies to the respective censoring time, which scales inversely proportional with the particle feed. The variations between lifetimes are therefore primarily based on the type of dust, the flow rate and manufacturing tolerances. The filter medium CC 600 G was used exclusively for these measurement samples, which are included in this data set.
Task: The objective of the data set is to precisely predict the remaining useful life (RUL) of the filter for the given test data, so a transition to predictive maintenance is made possible. For this purpose, the dataset contains training and test data, consisting both of 50 life tests respectively. The test data contains randomly right-censored run-to-failure measurements and the respective RUL as a ground truth to the prediction task. The main challenge is how to make the most use of the right-censored life data within the training data. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machine learning (various names are common).
Acknowledgement: Thanks go to Marc Hönig (Scientific Employee), Marcel Braig (Scientific Employee) and Christopher Rein (Research Assistant) for contributing to the recording of these life tests.
Data set Creator: Hochschule Esslingen - University of Applied Sciences Research Department Reliability Engineering and Prognostics and Health Management Robert-Bosch-Straße 1 73037 Göppingen Germany
Dataset Citation: Hagmeyer, S., Mauthe, F., & Zeiler, P. (2021). Creation of Publicly Available Data Sets for Prognostics and Diagnostics Addressing Data Scenarios Relevant to Industrial Applications. International Journal of Prognostics and Health Management, Volume 12, Issue 2, DOI: 10.36001/ijphm.2021.v12i2.3087
Facebook
TwitterThis paper deals with uncertainty, asymmetric information, and risk modelling in a complex power system. The uncertainty is managed by using probability and decision theory methods. Multiple-criteria decision making (MCDM) is a very effective and well-known tool to investigate fuzzy information more effectively. However, the selection of houses cannot be done by utilizing symmetry information, because enterprises do not have complete information, so asymmetric information should be used when selecting enterprises. In this paper, the notion of soft set (SftS) and interval-valued T-spherical fuzzy set (IVT-SFS) are combined to produce a new and more effective notion called interval-valued T-spherical fuzzy soft set (IVT−SFSftS). It is a more general concept and provides more space and options to decision makers (DMs) for making their decision in the field of fuzzy set theory. Moreover, some average aggregation operators like interval-valued T-spherical fuzzy soft weighted average (IVT−SFSftWA) operator, interval-valued T-spherical fuzzy soft ordered weighted average (IVT−SFSftOWA) operator, and interval-valued T-spherical fuzzy soft hybrid average (IVT−SFSftHA) operators are explored. Furthermore, the properties of these operators are discussed in detail. An algorithm is developed and an application example is proposed to show the validity of the present work. This manuscript shows how to make a decision when there is asymmetric information about an enterprise. Further, in comparative analysis, the established work is compared with another existing method to show the advantages of the present work.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterProject Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterBelize is one of the countries in Latin America that was not included in the World Fertility Survey, the Contraceptive Prevalence Survey project, or the Demographic and Health Survey program during the 1970's and 1980's. As a result, data on contraceptive prevalence and the use of maternal and child health services in Belize has been limited. The 1991 Family Health Survey was designed to provide health professionals and international donors with data to assess infant and child mortality, fertility, and the use of family planning and health services in Belize.
The objectives of the 1991 Family Health Survey were to: - obtain national fertility estimates; - estimate levels of infant and child mortality; - estimate the percentage of mothers who breastfed their last child and duration of breastfeeding; - determine levels of knowledge and current use of contraceptives for a variety of social and demographic background variables and to determine the source where users obtain the methods they use; - determine reasons for nonuse of contraception and estimate the percentage of women who are at risk of an unplanned pregnancy and, thus, in need of family planning services; and - examine the use of maternal and child health services and immunization levels for children less than 5 years of age and to examine the prevalence and treatment of diarrhea and acute respiratory infections among these children.
National
Sample survey data [ssd]
The 1991 Belize Family Health Survey was an area probability survey with two stages of selection. The sampling frame for the survey was the quick count of all households in the country conducted in 1990 by the Central Statistical Office in preparation for the 1991 census. Two strata, or domains, were sampled independently: urban areas and rural areas. In the first stage of selection for the urban domain, a systematic sample with a random start was used to select enumeration districts in the domain with probability of selection proportional to the number of households in each district. In the second stage of selection, households were chosen systematically using a constant sampling interval (4.2350) across all of the selected enumeration districts. The enumeration districts selected for the rural domain were the same as those that had been selected earlier for the 1990 Belize Household Expenditure Survey. The second stage selection of rural households was conducted the same way it was for the urban domain but used a constant sampling interval of 2.1363. In order to have a self-weighting geographic sample, 3,106 urban households and 1,871 rural households were selected for a total of 4,977 households.
Only one woman aged 15-44 per household was selected for interview. Each respondent's probability of selection was inversely proportional to the number of eligible women in the household. Thus, weighting factors were applied to compensate for this unequal probability of selection. In the tables presented in this report, proportions and means are based on the weighted number of cases, but the unweighted numbers are shown.
Face-to-face [f2f]
Of the 4,977 households selected, 4,566 households were visited. Overall, 8 percent of households could not be located, and 7 percent of the households were found to be vacant. Less than 3 percent of the households refused to be interviewed. Fifty-five percent of sample households includeed at least one woman aged 15-44. Complete interviews were obtained in 94 percent of the households that had an eligible respondent, for a total of 2,656 interviews. Interview completion rates did not vary by residence.
The estimates for a sample survey are affected by two types of errors: (1) sampling error and (2) non-sampling error. Non-sampling error is the result of mistakes made in carrying out data collection and data processing, including the failure to locate and interview the right household, errors in the way questions are asked or understood, and data entry errors. Although quality control efforts were made during the implementation of the Family Health Survey to minimize this type of error, non-sampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling error is defined as the difference between the true value for any variable measured in a survey and the value estimated by the survey. Sampling error is a measure of the variability between all possible samples that could have been selected from the same population using the same sample design and size. For the entire population and for large subgroups, the Family Health Survey is large enough that the sampling error for most estimates is small. However, for small subgroups, sampling errors are larger and may affect the reliability of the estimates. Sampling error is usually measured in terms of the standard error for a particular statistic (mean, proportion, or ratio), which is the square root of the variance. The standard error can be used to calculate confidence intervals for estimated statistics. For example, the 95 percent confidence interval for a statistic is the estimated value plus or minus 1.96 times the standard error for the estimate.
The standard errors of statistics estimated using a multistage cluster sample design, such as that used in the Family Health Survey, are more complex than are standard errors based on simple random samples, and they tend to be somewhat larger than the standard errors produced by a simple random sample. The increase in standard error due to using a multi-stage cluster design is referred to as the design effect, which is defined as the ratio between the variance for the estimate using the sample design that was used and the variance for the estimate that would result if a simple random sample had been used. Based on experience with similar surveys, the design effect generally falls in a range from 1.2 to 2.0 for most variables.
Table E.1 of the Final Report presents examples of what the 95 percent confidence interval on an estimated proportion would be, under a variety of sample sizes, assuming a design effect of 1.6. It presents half-widths of the 95 percent confidence intervals corresponding to sample sizes, ranging from 25 to 3200 cases, and corresponding to estimated proportions ranging from .05/.95 to .50/.50. The formula used for calculating the half-width of the 95 percent confidence interval is:
(half of 95% C.I.) = (1.96) SQRT {(1.6)(p)(1-p) / n},
where p is the estimated proportion, n is the number of cases used in calculating the proportion, and 1.6 is the design effect. It can be seen, for example, that for an estimated proportion of 0.30, and a sample of size of 200, half the width of the confidence interval is 0.08, so that the 95 percent confidence interval for the estimated proportion would be from 0.22 to 0.38. If the sample size had been 3200, instead of 200, the 95 percent confidence interval would be from 0.28 to 0.32.
The actual design effect for individual variables will vary, depending on how values of that variable are distributed among the clusters of the sample. These can be calculated using advanced statistical software for survey analysis.
Facebook
TwitterData licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
License information was derived automatically
General information: The data set includes traffic data from all locations in Hamburg, where cycling is recorded using infrared detectors at 24h a day and all days of the year.
The dataset contains in real time the traffic strengths of individual counting fields as well as aggregated counting points from multiple counting fields. The schematic structure of data collection and data aggregation is described in a separate document, which can be found in the references.
The data of the counting fields are provided in 5- and 15-minute intervals as well as in daily intervals. The data of the counting points are aggregated at 15- and 60-minute intervals as well as in daily and weekly values. The data of the counting points are also visualised in the corresponding geoportals of the FHH, e.g. Geo-Online and Transport Portal.
In addition to real-time data, historical data are also available to the following extent: Counting fields: all data for the previous seven days in 5- and 15-minute intervals, all data for the previous month in hourly intervals, all data for the current and previous year in daily intervals. Counting points: all data for the previous seven days in 15-minute intervals, all data for the previous month in hourly intervals, all data for the current and previous year in daily intervals, all data since the beginning of collection in weekly intervals.
Further information on the technical implementation of data provision can be found in the metadata of the associated data services.
Technical information: The infrared detectors are usually installed on lighting poles, but also on other masts. The detectors record and count traffic via the heat radiation of the individual traffic participants. Since only infrared images are evaluated, data protection is guaranteed at any time.
Information on data quality: The data is transmitted in real time to the Urban Data Platform of FHH. For example, they are available for all users and interested parties in a timely manner. However, due to the real-time component, different framework conditions have to be taken into account: The data are not fully quality assured at the time of the first publication. Unusual deviations from the expected data and data gaps are automatically detected by the system, but cannot be corrected in real time at the moment. Gaps that occur e.g. by demolition of the data transmission can still be returned afterwards. Under certain circumstances and in case of prolonged failures, changes in historical data may still be made after a few days. The data published here are not officially verified data from FHH.
As with any traffic count, whether automated or manual, there are certain tolerances in measurement accuracy. The system used here requires accuracy for the counting fields of ± 10 % for the measurement of cycling traffic on pavements, cycle paths and cycling lanes and ± 20 % for the recording of cycle traffic in mixed traffic with motor vehicles. Because counting points are formed from a combination of different counting fields, the deviation may be up to ± 20 %.
In principle, it should be noted that the system, including the recording technology, was still under development in 2020. The historical data from 2020 can thus be used to assess the fundamental development of transport, but some have greater inaccuracies and jumps in data quality.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.