74 datasets found

Confidence Interval Examples
figshare.com
application/cdfv2
Updated Jun 28, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Rollinson (2016). Confidence Interval Examples [Dataset]. http://doi.org/10.6084/m9.figshare.3466364.v2
Explore at:
application/cdfv2Available download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3466364.v2
Dataset updated
Jun 28, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Emily Rollinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.
f
Data from: A Statistical Inference Course Based on p-Values
figshare.com
tandf.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Martin (2023). A Statistical Inference Course Based on p-Values [Dataset]. http://doi.org/10.6084/m9.figshare.3494549.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3494549.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Ryan Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introductory statistical inference texts and courses treat the point estimation, hypothesis testing, and interval estimation problems separately, with primary emphasis on large-sample approximations. Here, I present an alternative approach to teaching this course, built around p-values, emphasizing provably valid inference for all sample sizes. Details about computation and marginalization are also provided, with several illustrative examples, along with a course outline. Supplementary materials for this article are available online.
G
Interval Data Validation and Estimation Tools Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Interval Data Validation and Estimation Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/interval-data-validation-and-estimation-tools-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Oct 3, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Interval Data Validation and Estimation Tools Market Outlook

According to our latest research, the global Interval Data Validation and Estimation Tools market size reached USD 1.46 billion in 2024. With a robust compound annual growth rate (CAGR) of 11.2% projected over the forecast period, the market is expected to reach USD 3.73 billion by 2033. This growth is primarily driven by the rising demand for advanced data quality assurance and analytics solutions across sectors such as BFSI, healthcare, manufacturing, and IT & telecommunications. As organizations increasingly rely on accurate interval data for operational efficiency and regulatory compliance, the adoption of validation and estimation tools continues to surge.

A key factor propelling the growth of the Interval Data Validation and Estimation Tools market is the exponential rise in data generation from connected devices, IoT sensors, and digital platforms. Businesses today are inundated with massive volumes of interval data, which, if not validated and accurately estimated, can lead to significant operational inefficiencies and decision-making errors. These tools play a crucial role in ensuring the integrity, accuracy, and completeness of interval data, thereby enabling organizations to derive actionable insights and maintain competitive advantage. Furthermore, the growing emphasis on automation and digital transformation initiatives is pushing enterprises to invest in sophisticated data validation and estimation solutions, further accelerating market growth.

Another major growth driver is the increasing stringency of regulatory requirements across industries, particularly in sectors such as BFSI, healthcare, and utilities. Regulations related to data governance, privacy, and reporting demand organizations to maintain high standards of data quality and compliance. Interval Data Validation and Estimation Tools help organizations adhere to these regulatory mandates by providing automated checks, anomaly detection, and robust audit trails. The integration of artificial intelligence and machine learning into these tools is further enhancing their capabilities, enabling real-time data validation and predictive estimation, which is critical in fast-paced business environments.

Additionally, the surge in cloud adoption and the proliferation of cloud-based data management platforms are significantly contributing to the market’s expansion. Cloud-based deployment models offer scalability, flexibility, and cost-efficiency, making advanced validation and estimation tools accessible to small and medium-sized enterprises as well as large organizations. The ability to seamlessly integrate with existing data architectures and third-party applications is also a key factor driving the adoption of both on-premises and cloud-based solutions. As data ecosystems become increasingly complex and distributed, the demand for interval data validation and estimation tools is expected to witness sustained growth through 2033.

From a regional perspective, North America currently holds the largest share of the Interval Data Validation and Estimation Tools market, driven by early technology adoption, a strong focus on data-driven decision-making, and a mature regulatory landscape. However, Asia Pacific is anticipated to register the fastest CAGR of 13.5% during the forecast period, fueled by rapid digitalization, expanding industrialization, and increasing investments in smart infrastructure. Europe and Latin America are also witnessing steady growth, supported by government initiatives and the rising importance of data quality management in emerging economies. The Middle East & Africa region, though comparatively nascent, is expected to demonstrate significant potential as digital transformation initiatives gain momentum.

Component Analysis

The Interval Data Validation and Estimation Tools market by component is broadly segmented into Software and Servic
G
Interval Data Analytics Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Interval Data Analytics Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/interval-data-analytics-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 6, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Interval Data Analytics Market Outlook

According to our latest research, the global Interval Data Analytics market size reached USD 3.42 billion in 2024, demonstrating robust growth across key verticals. The market is expected to advance at a CAGR of 13.8% from 2025 to 2033, leading to a projected value of USD 10.13 billion by 2033. This impressive expansion is primarily driven by rising demand for advanced analytics solutions capable of processing time-stamped and interval-based data, especially as organizations seek to optimize operations, enhance predictive capabilities, and comply with evolving regulatory requirements.

One of the most significant growth factors propelling the Interval Data Analytics market is the exponential increase in data generation from IoT devices, smart meters, and connected infrastructure across industries. Organizations in sectors such as utilities, manufacturing, and healthcare are increasingly reliant on interval data for resource optimization, real-time monitoring, and predictive maintenance. The ability of interval data analytics to handle vast amounts of granular, time-series data enables businesses to uncover actionable insights, reduce operational costs, and improve asset utilization. Additionally, the growing adoption of smart grids and intelligent energy management systems further amplifies the need for sophisticated interval data analytics solutions that support real-time decision-making and regulatory compliance.

Another pivotal driver for the Interval Data Analytics market is the rapid digital transformation and integration of artificial intelligence (AI) and machine learning (ML) technologies into analytics platforms. These advancements allow for more accurate forecasting, anomaly detection, and automated response mechanisms, which are critical in sectors like finance, healthcare, and telecommunications. As organizations continue to prioritize data-driven strategies, the demand for interval data analytics tools that can seamlessly integrate with existing IT ecosystems and provide scalable, cloud-based solutions is accelerating. Furthermore, the shift towards cloud computing and the proliferation of big data platforms are making it easier for enterprises of all sizes to deploy and scale interval data analytics capabilities, thus broadening the market's reach and potential.

Regulatory pressures and the increasing need for transparency and accountability in data handling are also fueling the growth of the Interval Data Analytics market. Industries such as banking and financial services, healthcare, and energy are subject to stringent compliance requirements that necessitate precise monitoring and reporting of interval data. The ability of interval data analytics platforms to provide auditable, time-stamped records and support regulatory reporting is becoming a critical differentiator for vendors in this space. Moreover, as data privacy laws evolve and enforcement intensifies, organizations are investing in analytics solutions that offer robust security features, data lineage tracking, and comprehensive audit trails, further boosting market adoption.

From a regional perspective, North America continues to lead the Interval Data Analytics market, driven by early technology adoption, a strong presence of leading analytics vendors, and substantial investments in digital infrastructure. However, the Asia Pacific region is rapidly emerging as a key growth engine, fueled by large-scale digitalization initiatives, expanding industrial automation, and increasing penetration of IoT devices. Europe also represents a significant market, underpinned by regulatory mandates and a mature industrial base. Latin America and the Middle East & Africa, while currently smaller in market share, are witnessing accelerated adoption as organizations in these regions recognize the value of interval data analytics in enhancing operational efficiency and competitiveness.

Component Analysis

The Interval Data Analytics market is segmented by component into software and services, each playing a distinct role in
Winkler Interval score metric
kaggle.com
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carl McBride Ellis (2023). Winkler Interval score metric [Dataset]. https://www.kaggle.com/datasets/carlmcbrideellis/winkler-interval-score-metric
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carl McBride Ellis
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Model performance evaluation: The Mean Winkler Interval score (MWIS)

We can assess the overall performance of a regression model that produces prediction intervals by using the mean Winkler Interval score [1,2,3] which, for an individual interval, is given by:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4051350%2Fe3bd94c6047815c0304b3851fc325a7c%2FWinkler_Interval_Score.png?generation=1700042360776825&alt=media" alt="">

where \(y\) is the true value, \(u\) it the upper prediction interval, \(l\) is the lower prediction interval, and \(\alpha\) is (1-coverage). For example, for 90% coverage, \(\alpha = 0.1\). Note that the Winkler Interval score constitutes a proper scoring rule [2,3].

[1] Robert L. Winkler "*A Decision-Theoretic Approach to Interval Estimation*", Journal of the American Statistical Association, volume 67, pp. 187-191 (1972)

[2] Tilmann Gneiting and Adrian E Raftery "*Strictly Proper Scoring Rules, Prediction, and Estimation*", Journal of the American Statistical Association, volume 102, pp. 359-378 (2007) (Section 6.2)

[3] Jonas R. Brehmer, and Tilmann Gneiting "*Scoring interval forecasts: Equal-tailed, shortest, and modal interval*", Bernoulli volume 27 pp. 1993-2010 (2021)

Python code: Usage example

Attach this dataset to a notebook, then:

import sys sys.path.append('/kaggle/input/winkler-interval-score-metric/') import MWIS_metric help(MWIS_metric.score)

MWIS,coverage = MWIS_metric.score(predictions["y_true"],predictions["lower"],predictions["upper"],alpha) print(f"Local MWI score ",round(MWIS,3)) print("Predictions coverage ", round(coverage*100,1),"%")
n
Data from: Transformation of measurement uncertainties into low-dimensional...
data.niaid.nih.gov
resodate.org
+1more
zip
Updated Feb 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonios Alexiadis; Scott Ferson; Eann A. Patterson (2021). Transformation of measurement uncertainties into low-dimensional feature vector space [Dataset]. http://doi.org/10.5061/dryad.6hdr7sqx2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.6hdr7sqx2
Dataset updated
Feb 1, 2021
Dataset provided by
University of Liverpool
Authors
Antonios Alexiadis; Scott Ferson; Eann A. Patterson
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Advances in technology allow the acquisition of data with high spatial and temporal resolution. These datasets are usually accompanied by estimates of the measurement uncertainty, which may be spatially or temporally varying and should be taken into consideration when making decisions based on the data. At the same time, various transformations are commonly implemented to reduce the dimensionality of the datasets for post-processing, or to extract significant features. However, the corresponding uncertainty is not usually represented in the low-dimensional or feature vector space. A method is proposed that maps the measurement uncertainty into the equivalent low-dimensional space with the aid of approximate Bayesian computation, resulting in a distribution that can be used to make statistical inferences. The method involves no assumptions about the probability distribution of the measurement error and is independent of the feature extraction process as demonstrated in three examples. In the first two examples Chebyshev polynomials were used to analyse structural displacements and soil moisture measurements; while in the third, principal component analysis was used to decompose global ocean temperature data. The uses of the method range from supporting decision making in model validation or confirmation, model updating or calibration and tracking changes in condition, such as the characterisation of the El Niño Southern Oscillation.
Wind Generation Time Interval Exploration Data
data.ca.gov
data.cnra.ca.gov
+3more
Updated Jan 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Energy Commission (2024). Wind Generation Time Interval Exploration Data [Dataset]. https://data.ca.gov/dataset/wind-generation-time-interval-exploration-data
Explore at:
zip, gpkg, gdb, arcgis geoservices rest api, kml, geojson, csv, html, xlsx, txtAvailable download formats
Dataset updated
Jan 19, 2024
Dataset authored and provided by
California Energy Commissionhttp://www.energy.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data set behind the Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period.

The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.

Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan.

By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
Melodic Intervals Size Statistics for the most commonly occurring intervals....
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shui' er Han; Janani Sundararajan; Daniel Liu Bowling; Jessica Lake; Dale Purves (2023). Melodic Intervals Size Statistics for the most commonly occurring intervals. (Independent – samples t-tests). [Dataset]. http://doi.org/10.1371/journal.pone.0020160.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0020160.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Shui' er Han; Janani Sundararajan; Daniel Liu Bowling; Jessica Lake; Dale Purves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Statistics for the comparisons of the most commonly occurring melodic interval sizes in tone and non-tone language music databases; n1 and n2 refer to the sample sizes of tone and non-tone language music databases. (All comparisons were made with the two-tailed independent samples t-test, α-level adjusted using the Bonferroni method).
w
Family Health Survey 1991 - Belize
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Jul 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistical Office (CS)) (2014). Family Health Survey 1991 - Belize [Dataset]. https://microdata.worldbank.org/index.php/catalog/978
Explore at:
Dataset updated
Jul 8, 2014
Dataset authored and provided by
Central Statistical Office (CS))
Time period covered
1991 - 1999
Area covered
Belize
Description
Abstract

Belize is one of the countries in Latin America that was not included in the World Fertility Survey, the Contraceptive Prevalence Survey project, or the Demographic and Health Survey program during the 1970's and 1980's. As a result, data on contraceptive prevalence and the use of maternal and child health services in Belize has been limited. The 1991 Family Health Survey was designed to provide health professionals and international donors with data to assess infant and child mortality, fertility, and the use of family planning and health services in Belize.

The objectives of the 1991 Family Health Survey were to: - obtain national fertility estimates; - estimate levels of infant and child mortality; - estimate the percentage of mothers who breastfed their last child and duration of breastfeeding; - determine levels of knowledge and current use of contraceptives for a variety of social and demographic background variables and to determine the source where users obtain the methods they use; - determine reasons for nonuse of contraception and estimate the percentage of women who are at risk of an unplanned pregnancy and, thus, in need of family planning services; and - examine the use of maternal and child health services and immunization levels for children less than 5 years of age and to examine the prevalence and treatment of diarrhea and acute respiratory infections among these children.

Geographic coverage

National

Kind of data

Sample survey data [ssd]

Sampling procedure

The 1991 Belize Family Health Survey was an area probability survey with two stages of selection. The sampling frame for the survey was the quick count of all households in the country conducted in 1990 by the Central Statistical Office in preparation for the 1991 census. Two strata, or domains, were sampled independently: urban areas and rural areas. In the first stage of selection for the urban domain, a systematic sample with a random start was used to select enumeration districts in the domain with probability of selection proportional to the number of households in each district. In the second stage of selection, households were chosen systematically using a constant sampling interval (4.2350) across all of the selected enumeration districts. The enumeration districts selected for the rural domain were the same as those that had been selected earlier for the 1990 Belize Household Expenditure Survey. The second stage selection of rural households was conducted the same way it was for the urban domain but used a constant sampling interval of 2.1363. In order to have a self-weighting geographic sample, 3,106 urban households and 1,871 rural households were selected for a total of 4,977 households.

Only one woman aged 15-44 per household was selected for interview. Each respondent's probability of selection was inversely proportional to the number of eligible women in the household. Thus, weighting factors were applied to compensate for this unequal probability of selection. In the tables presented in this report, proportions and means are based on the weighted number of cases, but the unweighted numbers are shown.

Mode of data collection

Face-to-face [f2f]

Response rate

Of the 4,977 households selected, 4,566 households were visited. Overall, 8 percent of households could not be located, and 7 percent of the households were found to be vacant. Less than 3 percent of the households refused to be interviewed. Fifty-five percent of sample households includeed at least one woman aged 15-44. Complete interviews were obtained in 94 percent of the households that had an eligible respondent, for a total of 2,656 interviews. Interview completion rates did not vary by residence.

Sampling error estimates

The estimates for a sample survey are affected by two types of errors: (1) sampling error and (2) non-sampling error. Non-sampling error is the result of mistakes made in carrying out data collection and data processing, including the failure to locate and interview the right household, errors in the way questions are asked or understood, and data entry errors. Although quality control efforts were made during the implementation of the Family Health Survey to minimize this type of error, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling error is defined as the difference between the true value for any variable measured in a survey and the value estimated by the survey. Sampling error is a measure of the variability between all possible samples that could have been selected from the same population using the same sample design and size. For the entire population and for large subgroups, the Family Health Survey is large enough that the sampling error for most estimates is small. However, for small subgroups, sampling errors are larger and may affect the reliability of the estimates. Sampling error is usually measured in terms of the standard error for a particular statistic (mean, proportion, or ratio), which is the square root of the variance. The standard error can be used to calculate confidence intervals for estimated statistics. For example, the 95 percent confidence interval for a statistic is the estimated value plus or minus 1.96 times the standard error for the estimate.

The standard errors of statistics estimated using a multistage cluster sample design, such as that used in the Family Health Survey, are more complex than are standard errors based on simple random samples, and they tend to be somewhat larger than the standard errors produced by a simple random sample. The increase in standard error due to using a multi-stage cluster design is referred to as the design effect, which is defined as the ratio between the variance for the estimate using the sample design that was used and the variance for the estimate that would result if a simple random sample had been used. Based on experience with similar surveys, the design effect generally falls in a range from 1.2 to 2.0 for most variables.

Table E.1 of the Final Report presents examples of what the 95 percent confidence interval on an estimated proportion would be, under a variety of sample sizes, assuming a design effect of 1.6. It presents half-widths of the 95 percent confidence intervals corresponding to sample sizes, ranging from 25 to 3200 cases, and corresponding to estimated proportions ranging from .05/.95 to .50/.50. The formula used for calculating the half-width of the 95 percent confidence interval is:

(half of 95% C.I.) = (1.96) SQRT {(1.6)(p)(1-p) / n},

where p is the estimated proportion, n is the number of cases used in calculating the proportion, and 1.6 is the design effect. It can be seen, for example, that for an estimated proportion of 0.30, and a sample of size of 200, half the width of the confidence interval is 0.08, so that the 95 percent confidence interval for the estimated proportion would be from 0.22 to 0.38. If the sample size had been 3200, instead of 200, the 95 percent confidence interval would be from 0.28 to 0.32.

The actual design effect for individual variables will vary, depending on how values of that variable are distributed among the clusters of the sample. These can be calculated using advanced statistical software for survey analysis.
S
Simulated data for Confidence Interval Width Contours: Sample Size Planning...
scidb.cn
Updated Jun 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liu Yue; Xu Lei; Liu Hongyun (2022). Simulated data for Confidence Interval Width Contours: Sample Size Planning for Linear Mixed-Effects Models [Dataset]. http://doi.org/10.57760/sciencedb.01813
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.01813
Dataset updated
Jun 17, 2022
Dataset provided by
Science Data Bank
Authors
Liu Yue; Xu Lei; Liu Hongyun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set is the simulation data for sample size planning based on the power analysis and accuracy in parameter estimation for linear mixed-effects models.
Condition Data with Random Recording Time
kaggle.com
zip
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prognostics @ HSE (2022). Condition Data with Random Recording Time [Dataset]. https://www.kaggle.com/datasets/prognosticshse/condition-data-with-random-recording-time/data
Explore at:
zip(1167682 bytes)Available download formats
Dataset updated
Jun 10, 2022
Authors
Prognostics @ HSE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context: This data set originates from a practice-relevant degradation process, which is representative for Prognostics and Health Management (PHM) applications. The observed degradation process is the clogging of filters when separating of solid particles from gas. A test bench is used for this purpose, which performs automated life testing of filter media by loading them. For testing, dust complying with ISO standard 12103-1 and with a known particle size distribution is employed. The employed filter media is made of randomly oriented non-woven fibre material. Further data sets are generated for various practice-relevant data situations which do not correspond to the ideal conditions of full data coverage. These data sets are uploaded to Kaggle by the user "Prognostics @ HSE" in a continuous process. In order to avoid the carryover between two data sets, a different configuration of the filter tests is used for each uploaded practice-relevant data situation, for example by selecting a different filter media.

Detailed specification: For more information about the general operation and the components used, see the provided description file Random Recording Condition Data Data Set.pdf

Given data situation: In order to implement a predictive maintenance policy, knowledge about the time of failure respectively about the remaining useful life (RUL) of the technical system is necessary. The time of failure or the RUL can be predicted on the basis of condition data that indicate the damage progression of a technical system over time. However, the collection of condition data in typical industrial PHM applications is often only possible in an incomplete manner. An example is the collection of data during defined test cycles with specific loads, carried at intervals. For instance, this approach is often used with machining centers, where test cycles are only carried out between finished machining jobs or work shifts. Due to different work pieces, the machining time varies and the test cycle with the recording of condition data is not performed equidistantly. This results in a data characteristic that is comparable to a random sample of continuously recorded condition data. Another example that may result in such a data characteristic comes from the effort to reduce data volumes when recording condition data. Attempts can be made to keep the amount of data with unchanged damage as small as possible. One possible measure is not to transmit and store the continuous sensor readings, but rather sections of them, which also leads to gaps in the data available for prognosis. In the present data set, the life cycle of filters or rather their condition data, represented by the differential pressure, is considered. Failure of the filter occurs when the differential pressure across the filter exceeds 600 Pa. The time until a filter failure occurs depends especially on the amount of dust supplied per time, which is constant within a run-to-failure cycle. The previously explained data characteristics are addressed by means of corresponding training and test data. The training data is structured as follows: A run-to-failure cycle contains n batches of data. The number n varies between the cycles and depends on the duration of the batches and the time interval between the individual batches. The duration and time interval of the batches are random variables. A data batch includes the sensor readings of differential pressure and flow rate for the filter, the start and end time of the batch, and RUL information related to the end time of the batch. The sensor readings of the differential pressure and flow rate are recorded at a constant sampling rate. Figure 6 shows an illustrative run-to-failure cycle with multiple batches. The test data are randomly right-censored. They are also made of batches with a random duration and time interval between the batches. For each batch contained, the start and end time are given, as well as the sensor readings within the batch. The RUL is not given for each batch but only for the last data point of the right-censored run-to-failure cycle.

Task: The aim is to predict the RUL of the censored filter test cycles given in the test data. In order to predict the RUL, training and test data are given, each consisting of 60 and 40 run-to-failure cycles. The test data contains random right-censored run-to-failure cycles and the respective RUL for the prediction task. The main challenge is to make the best use of the incompletely recorded training and test data to provide the most accurate prediction possible. Due to the detailed description of the setup and the various physical filter models described in literature, it is possible to support the actual data-driven models by integrating physical knowledge respectively models in the sense of theory-guided data science or informed machi...
2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates Subject Tables) [Dataset]. https://data.census.gov/table/ACSST1Y2021.S0101?q=S0101:+AGE+AND+SEX
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2021
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2021 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The age dependency ratio is derived by dividing the combined under-18 and 65-and-over populations by the 18-to-64 population and multiplying by 100..The old-age dependency ratio is derived by dividing the population 65 and over by the 18-to-64 population and multiplying by 100..The child dependency ratio is derived by dividing the population under 18 by the 18-to-64 population and multiplying by 100..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
undefined undefined: undefined | undefined (undefined)
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSST5Y2014.S1702
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2010-2014 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates
Data from: Melded Confidence Intervals Do Not Provide Guaranteed Coverage
tandf.figshare.com
txt
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Frey; Yimin Zhang (2024). Melded Confidence Intervals Do Not Provide Guaranteed Coverage [Dataset]. http://doi.org/10.6084/m9.figshare.24112584.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24112584.v1
Dataset updated
Jul 23, 2024
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Jesse Frey; Yimin Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Melded confidence intervals were proposed as a way to combine two independent one-sample confidence intervals to obtain a two-sample confidence interval for a quantity like a difference or a ratio. Simulation-based work has suggested that melded confidence intervals always provide at least the nominal coverage. However, we show here that for the case of melded confidence intervals for a difference in population quantiles, the confidence intervals do not guarantee the nominal coverage. We derive a lower bound on the coverage for a one-sided confidence interval, and we show that there are pairs of distributions that make the coverage arbitrarily close to this lower bound. One specific example of our results is that the 95% melded upper bound on the difference between two population medians offers a guaranteed coverage of only 88.3% when both samples are of size 20.
Z
Data from: HRV-ACC: a dataset with R-R intervals and accelerometer data for...
data.niaid.nih.gov
zenodo.org
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamil Książek; Wilhelm Masarczyk; Przemysław Głomb; Michał Romaszewski; Iga Stokłosa; Piotr Ścisło; Paweł Dębski; Robert Pudlo; Piotr Gorczyca; Magdalena Piegza (2023). HRV-ACC: a dataset with R-R intervals and accelerometer data for the diagnosis of psychotic disorders using a Polar H10 wearable sensor [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8171265
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Department of Psychiatry, Faculty of Medical Sciences in Zabrze, Medical University of Silesia
Psychiatric Department of the Multidisciplinary Hospital in Tarnowskie Góry
Institute of Psychology, Humanitas University in Sosnowiec
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences
Department of Psychoprophylaxis, Faculty of Medical Sciences in Zabrze, Medical University of Silesia
Authors
Kamil Książek; Wilhelm Masarczyk; Przemysław Głomb; Michał Romaszewski; Iga Stokłosa; Piotr Ścisło; Paweł Dębski; Robert Pudlo; Piotr Gorczyca; Magdalena Piegza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

The issue of diagnosing psychotic diseases, including schizophrenia and bipolar disorder, in particular, the objectification of symptom severity assessment, is still a problem requiring the attention of researchers. Two measures that can be helpful in patient diagnosis are heart rate variability calculated based on electrocardiographic signal and accelerometer mobility data. The following dataset contains data from 30 psychiatric ward patients having schizophrenia or bipolar disorder and 30 healthy persons. The duration of the measurements for individuals was usually between 1.5 and 2 hours. R-R intervals necessary for heart rate variability calculation were collected simultaneously with accelerometer data using a wearable Polar H10 device. The Positive and Negative Syndrome Scale (PANSS) test was performed for each patient participating in the experiment, and its results were attached to the dataset. Furthermore, the code for loading and preprocessing data, as well as for statistical analysis, was included on the corresponding GitHub repository.

BACKGROUND

Heart rate variability (HRV), calculated based on electrocardiographic (ECG) recordings of R-R intervals stemming from the heart's electrical activity, may be used as a biomarker of mental illnesses, including schizophrenia and bipolar disorder (BD) [Benjamin et al]. The variations of R-R interval values correspond to the heart's autonomic regulation changes [Berntson et al, Stogios et al]. Moreover, the HRV measure reflects the activity of the sympathetic and parasympathetic parts of the autonomous nervous system (ANS) [Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, Matusik et al]. Patients with psychotic mental disorders show a tendency for a change in the centrally regulated ANS balance in the direction of less dynamic changes in the ANS activity in response to different environmental conditions [Stogios et al]. Larger sympathetic activity relative to the parasympathetic one leads to lower HRV, while, on the other hand, higher parasympathetic activity translates to higher HRV. This loss of dynamic response may be an indicator of mental health. Additional benefits may come from measuring the daily activity of patients using accelerometry. This may be used to register periods of physical activity and inactivity or withdrawal for further correlation with HRV values recorded at the same time.

EXPERIMENTS

In our experiment, the participants were 30 psychiatric ward patients with schizophrenia or BD and 30 healthy people. All measurements were performed using a Polar H10 wearable device. The sensor collects ECG recordings and accelerometer data and, additionally, prepares a detection of R wave peaks. Participants of the experiment had to wear the sensor for a given time. Basically, it was between 1.5 and 2 hours, but the shortest recording was 70 minutes. During this time, evaluated persons could perform any activity a few minutes after starting the measurement. Participants were encouraged to undertake physical activity and, more specifically, to take a walk. Due to patients being in the medical ward, they received instruction to take a walk in the corridors at the beginning of the experiment. They were to repeat the walk 30 minutes and 1 hour after the first walk. The subsequent walks were to be slightly longer (about 3, 5 and 7 minutes, respectively). We did not remind or supervise the command during the experiment, both in the treatment and the control group. Seven persons from the control group did not receive this order and their measurements correspond to freely selected activities with rest periods but at least three of them performed physical activities during this time. Nevertheless, at the start of the experiment, all participants were requested to rest in a sitting position for 5 minutes. Moreover, for each patient, the disease severity was assessed using the PANSS test and its scores are attached to the dataset.

The data from sensors were collected using Polar Sensor Logger application [Happonen]. Such extracted measurements were then preprocessed and analyzed using the code prepared by the authors of the experiment. It is publicly available on the GitHub repository [Książek et al].

Firstly, we performed a manual artifact detection to remove abnormal heartbeats due to non-sinus beats and technical issues of the device (e.g. temporary disconnections and inappropriate electrode readings). We also performed anomaly detection using Daubechies wavelet transform. Nevertheless, the dataset includes raw data, while a full code necessary to reproduce our anomaly detection approach is available in the repository. Optionally, it is also possible to perform cubic spline data interpolation. After that step, rolling windows of a particular size and time intervals between them are created. Then, a statistical analysis is prepared, e.g. mean HRV calculation using the RMSSD (Root Mean Square of Successive Differences) approach, measuring a relationship between mean HRV and PANSS scores, mobility coefficient calculation based on accelerometer data and verification of dependencies between HRV and mobility scores.

DATA DESCRIPTION

The structure of the dataset is as follows. One folder, called HRV_anonymized_data contains values of R-R intervals together with timestamps for each experiment participant. The data was properly anonymized, i.e. the day of the measurement was removed to prevent person identification. Files concerned with patients have the name treatment_X.csv, where X is the number of the person, while files related to the healthy controls are named control_Y.csv, where Y is the identification number of the person. Furthermore, for visualization purposes, an image of the raw RR intervals for each participant is presented. Its name is raw_RR_{control,treatment}_N.png, where N is the number of the person from the control/treatment group. The collected data are raw, i.e. before the anomaly removal. The code enabling reproducing the anomaly detection stage and removing suspicious heartbeats is publicly available in the repository [Książek et al]. The structure of consecutive files collecting R-R intervals is following:

Phone timestamp RR-interval [ms] 12:43:26.538000 651 12:43:27.189000 632 12:43:27.821000 618 12:43:28.439000 621 12:43:29.060000 661 ... ...

The first column contains the timestamp for which the distance between two consecutive R peaks was registered. The corresponding R-R interval is presented in the second column of the file and is expressed in milliseconds.
The second folder, called accelerometer_anonymized_data contains values of accelerometer data collected at the same time as R-R intervals. The naming convention is similar to that of the R-R interval data: treatment_X.csv and control_X.csv represent the data coming from the persons from the treatment and control group, respectively, while X is the identification number of the selected participant. The numbers are exactly the same as for R-R intervals. The structure of the files with accelerometer recordings is as follows:

Phone timestamp X [mg] Y [mg] Z [mg] 13:00:17.196000 -961 -23 182 13:00:17.205000 -965 -21 181 13:00:17.215000 -966 -22 187 13:00:17.225000 -967 -26 193 13:00:17.235000 -965 -27 191 ... ... ... ...

The first column contains a timestamp, while the next three columns correspond to the currently registered acceleration in three axes: X, Y and Z, in milli-g unit.

We also attached a file with the PANSS test scores (PANSS.csv) for all patients participating in the measurement. The structure of this file is as follows:

no_of_person PANSS_P PANSS_N PANSS_G PANSS_total 1 8 13 22 43 2 11 7 18 36 3 14 30 44 88 4 18 13 27 58 ... ... ... ... ..

The first column contains the identification number of the patient, while the three following columns refer to the PANSS scores related to positive, negative and general symptoms, respectively.

USAGE NOTES

All the files necessary to run the HRV and/or accelerometer data analysis are available on the GitHub repository [Książek et al]. HRV data loading, preprocessing (i.e. anomaly detection and removal), as well as the calculation of mean HRV values in terms of the RMSSD, is performed in the main.py file. Also, Pearson's correlation coefficients between HRV values and PANSS scores and the statistical tests (Levene's and Mann-Whitney U tests) comparing the treatment and control groups are computed. By default, a sensitivity analysis is made, i.e. running the full pipeline for different settings of the window size for which the HRV is calculated and various time intervals between consecutive windows. Preparing the heatmaps of correlation coefficients and corresponding p-values can be done by running the utils_advanced_plots.py file after performing the sensitivity analysis. Furthermore, a detailed analysis for the one selected set of hyperparameters may be prepared (by setting sensitivity_analysis = False), i.e. for 15-minute window sizes, 1-minute time intervals between consecutive windows and without data interpolation method. Also, patients taking quetiapine may be excluded from further calculations by setting exclude_quetiapine = True because this medicine can have a strong impact on HRV [Hattori et al].

The accelerometer data processing may be performed using the utils_accelerometer.py file. In this case, accelerometer recordings are downsampled to ensure the same timestamps as for R-R intervals and, for each participant, the mobility coefficient is calculated. Then, a correlation
Data from: (Table 1) Number of counted pollen samples, intervals and time...
doi.pangaea.de
search.dataone.org
html, tsv
Updated 2003
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jun Tian; Xiangjun Sun; Yunli Luo; Fei Huang; Pinxian Wang (2003). (Table 1) Number of counted pollen samples, intervals and time resolution between samples of ODP SIte 184-1144 [Dataset]. http://doi.org/10.1594/PANGAEA.738717
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.738717
Dataset updated
2003
Dataset provided by
PANGAEA
Authors
Jun Tian; Xiangjun Sun; Yunli Luo; Fei Huang; Pinxian Wang
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Mar 13, 1999 - Mar 18, 1999
Area covered

Variables measured
Sample amount, Depth, top/min, Sample spacing, Time resolution, Age, maximum/old, Depth, bottom/max, Age, minimum/young, DEPTH, sediment/rock
Description
This dataset is about: (Table 1) Number of counted pollen samples, intervals and time resolution between samples of ODP SIte 184-1144. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.738719 for more information.
Z
Simulation Data & R scripts for: "Introducing recurrent events analyses to...
data.niaid.nih.gov
doi.org
+1more
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
Explore at:
Dataset updated
Apr 29, 2024
Dataset provided by
Department of National Park Monitoring and Animal Management, Bavarian Forest National Park
Authors
Ferry, Nicolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Files descriptions:

All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

"results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.
undefined undefined: undefined | undefined (undefined)
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSST1Y2016.S0801
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Tell us what you think. Provide feedback to help make American Community Survey data more useful for you..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2016 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..Workers include members of the Armed Forces and civilians who were at work last week..The 12 selected states are Connecticut, Maine, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, and Wisconsin..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2016 American Community Survey 1-Year Estimates
Wind Generation Time Interval Exploration Tool
catalog.data.gov
data.cnra.ca.gov
+2more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Energy Commission (2025). Wind Generation Time Interval Exploration Tool [Dataset]. https://catalog.data.gov/dataset/wind-generation-time-interval-exploration-tool-28d5a
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Energy Commissionhttp://www.energy.ca.gov/
Description
This Wind Generation Interactive Query Tool created by the CEC. The visualization tool interactively displays wind generation over different time intervals in three-dimensional space. The viewer can look across the state to understand generation patterns of regions with concentrations of wind power plants. The tool aids in understanding high and low periods of generation. Operation of the electric grid requires that generation and demand are balanced in each period. The height and color of columns at wind generation areas are scaled and shaded to represent capacity factors (CFs) of the areas in a specific time interval. Capacity factor is the ratio of the energy produced to the amount of energy that could ideally have been produced in the same period using the rated nameplate capacity. Due to natural variations in wind speeds, higher factors tend to be seen over short time periods, with lower factors over longer periods. The capacity used is the reported nameplate capacity from the Quarterly Fuel and Energy Report, CEC-1304A. CFs are based on wind plants in service in the wind generation areas.Renewable energy resources like wind facilities vary in size and geographic distribution within each state. Resource planning, land use constraints, climate zones, and weather patterns limit availability of these resources and where they can be developed. National, state, and local policies also set limits on energy generation and use. An example of resource planning in California is the Desert Renewable Energy Conservation Plan. By exploring the visualization, a viewer can gain a three-dimensional understanding of temporal variation in generation CFs, along with how the wind generation areas compare to one another. The viewer can observe that areas peak in generation in different periods. The large range in CFs is also visible.
intraday trading information for IBM stock
kaggle.com
zip
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruvil Patel (2024). intraday trading information for IBM stock [Dataset]. https://www.kaggle.com/datasets/dhruvil633/intraday-trading-information-for-ibm-stock
Explore at:
zip(1207 bytes)Available download formats
Dataset updated
Nov 12, 2024
Authors
Dhruvil Patel
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
IBM Intraday Stock Price Data (5-Minute Intervals) This dataset provides comprehensive intraday trading data for IBM stock at 5-minute intervals, capturing essential price and volume metrics for each trading session. It is ideal for short-term trading analysis, pattern recognition, and intraday trend forecasting.

Dataset Overview Each row in the dataset represents IBM's stock information for a specific 5-minute interval, including:

Timestamp: The exact time (Eastern Time) for each data entry. Open: The stock price at the beginning of the interval. High: The highest price within the interval. Low: The lowest price within the interval. Close: The stock price at the end of the interval. Volume: The number of shares traded within the interval. Potential Uses This dataset is well-suited for various financial and quantitative analysis projects, such as:

Volume and Price Movement Analysis: Identify periods with unusually high trading volume and investigate if they correspond with significant price changes or market events. Intraday Trend Analysis: Observe trends by plotting the closing prices over time to spot patterns in stock performance during a single trading day or across multiple days. Volatility Detection: Track intervals with a high difference between the high and low prices to detect periods of increased price volatility. Time-Series Forecasting: Use machine learning models to predict price movements based on historical intraday data and patterns. Example Analysis Ideas Visualize Price Movements: Plot open, high, low, and close prices over time to get a clear view of price trends and fluctuations. Analyze Volume Spikes: Find and investigate timestamps with high trading volume, which might indicate significant market activity. Apply Machine Learning: Use techniques such as LSTM, ARIMA, or other time-series forecasting models to predict short-term price movements. This dataset is especially valuable for traders, quantitative analysts, and developers building financial models or applications that require real-time market insights.

About the Data This dataset was obtained via the Alpha Vantage API, using their TIME_SERIES_INTRADAY function. The data here represents IBM's intraday stock price movements on November 11, 2024, at 5-minute intervals.

Facebook

Twitter

Click to copy link

Link copied

Cite

Emily Rollinson (2016). Confidence Interval Examples [Dataset]. http://doi.org/10.6084/m9.figshare.3466364.v2

Confidence Interval Examples

Explore at:

62 scholarly articles cite this dataset (View in Google Scholar)

application/cdfv2Available download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3466364.v2

Dataset updated

Jun 28, 2016

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Emily Rollinson

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Examples demonstrating how confidence intervals change depending on the level of confidence (90% versus 95% versus 99%) and on the size of the sample (CI for n=20 versus n=10 versus n=2). Developed for BIO211 (Statistics and Data Analysis: A Conceptual Approach) at Stony Brook University in Fall 2015.

Clear search

Close search

Google apps

Main menu

Confidence Interval Examples

Data from: A Statistical Inference Course Based on p-Values

Interval Data Validation and Estimation Tools Market Research Report 2033

Interval Data Validation and Estimation Tools Market Outlook

Component Analysis

Interval Data Analytics Market Research Report 2033

Interval Data Analytics Market Outlook

Component Analysis

Winkler Interval score metric

Model performance evaluation: The Mean Winkler Interval score (MWIS)

Python code: Usage example

Data from: Transformation of measurement uncertainties into low-dimensional...

Wind Generation Time Interval Exploration Data

Melodic Intervals Size Statistics for the most commonly occurring intervals....

Family Health Survey 1991 - Belize

Abstract

Geographic coverage

Kind of data

Sampling procedure

Mode of data collection

Response rate

Sampling error estimates

Simulated data for Confidence Interval Width Contours: Sample Size Planning...

Condition Data with Random Recording Time

2021 American Community Survey: S0101 | AGE AND SEX (ACS 1-Year Estimates...

undefined undefined: undefined | undefined (undefined)

Data from: Melded Confidence Intervals Do Not Provide Guaranteed Coverage

Data from: HRV-ACC: a dataset with R-R intervals and accelerometer data for...

Data from: (Table 1) Number of counted pollen samples, intervals and time...

Simulation Data & R scripts for: "Introducing recurrent events analyses to...

undefined undefined: undefined | undefined (undefined)

Wind Generation Time Interval Exploration Tool

intraday trading information for IBM stock

Confidence Interval Examples