100+ datasets found

m
The banksia plot: a method for visually comparing point estimates and...
bridges.monash.edu
researchdata.edu.au
txt
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.26180/25286407.v2
Dataset updated
Oct 15, 2024
Dataset provided by
Monash University
Authors
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Estimating Confidence Intervals for 2020 Census Statistics Using Approximate...
registry.opendata.aws
Updated Aug 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau (2024). Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2010 Census Proof of Concept) [Dataset]. https://registry.opendata.aws/census-2010-amc-mdf-replicates/
Explore at:
Dataset updated
Aug 5, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2, ..., PPMF25) are a set of microdata files intended for use in estimating the magnitude of error(s) introduced by the 2020 Decennial Census Disclosure Avoidance System (DAS) into the Redistricting and DHC products. The PPMF0 was created by executing the 2020 DAS TopDown Algorithm (TDA) using the confidential 2010 Census Edited File (CEF) as the initial input; the replicates were then created by executing the 2020 DAS TDA repeatedly with the PPMF0 as its initial input. Inspired by analogy to the use of bootstrap methods in non-private contexts, U.S. Census Bureau (USCB) researchers explored whether simple calculations based on comparing each PPMFi to the PPMF0 could be used to reliably estimate the scale of errors introduced by the 2020 DAS, and generally found this approach worked well.

The PPMF0 and PPMFi files contained here are provided so that external researchers can estimate properties of DAS-introduced error without privileged access to internal USCB-curated data sets; further information on the estimation methodology can be found in Ashmead et. al 2024.

The 2010 DHC AMC seed PPMF0 and PPMF replicates have been cleared for public dissemination by the USCB Disclosure Review Board (CBDRB-FY24-DSEP-0002). The 2010 PPMF0 included in these files was produced using the same parameters and settings as were used to produce the 2010 Demonstration Data Product Suite (2023-04-03) PPMF, but represents an independent execution of the TopDown Algorithm. The PPMF0 and PPMF replicates contain all Person and Units attributes necessary to produce the Redistricting and DHC publications for both the United States and Puerto Rico, and include geographic detail down to the Census Block level. They do not include attributes specific to either the Detailed DHC-A or Detailed DHC-B products; in particular, data on Major Race (e.g., White Alone) is included, but data on Detailed Race (e.g., Cambodian) is not included in the PPMF0 and replicates.

The 2020 AMC replicate files for estimating confidence intervals for the official 2020 Census statistics are available.
Population with confidence in EU institutions by institution
data.europa.eu
gimi9.com
csv, html, tsv, xml
Updated Nov 6, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eurostat (2017). Population with confidence in EU institutions by institution [Dataset]. https://data.europa.eu/data/datasets/agvl4w4bhtllpvo3givrw?locale=en
Explore at:
csv(9631), html, tsv, xml(7101), xml(8809)Available download formats
Dataset updated
Nov 6, 2017
Dataset authored and provided by
Eurostathttps://ec.europa.eu/eurostat
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
European Union
Description
The indicator measures confidence among EU citizens in a selection of EU institutions: the European Parliament, the European Commission, and the European Central Bank. It is expressed as the share of positive opinions (people who declare that they tend to trust) about the institutions. Citizens are asked to express their confidence levels by choosing the following alternatives: ‘tend to trust’, ‘tend not to trust’ and ‘don’t know’ or ‘no answer’. The indicator is based on the Eurobarometer, a survey which has been conducted twice a year since 1973 to monitor the evolution of public opinion in the Member States.
d
1.11 Feeling Safe in Work (summary)
catalog.data.gov
data-academy.tempe.gov
+8more
Updated Jul 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). 1.11 Feeling Safe in Work (summary) [Dataset]. https://catalog.data.gov/dataset/1-11-feeling-safe-in-work-summary-b5f31
Explore at:
Dataset updated
Jul 5, 2025
Dataset provided by
City of Tempe
Description
This dataset comes from the biennial City of Tempe Employee Survey question about feeling safe in the physical work environment (building). The Employee Survey question relating to this performance measure: “Please rate your level of agreement: My physical work environment (building) is safe, clean & maintained in good operating order.” Survey respondents are asked to rate their agreement level on a scale of 5 to 1, where 5 means “Strongly Agree” and 1 means “Strongly Disagree” (without “don’t know” responses included).The survey was voluntary, and employees were allowed to complete the survey during work hours or at home. The survey allowed employees to respond anonymously and has a 95% confidence level. This page provides data about the Feeling Safe in City Facilities performance measure. The performance measure dashboard is available at 1.11 Feeling Safe in City FacilitiesAdditional InformationSource: Employee SurveyContact: Wydale HolmesContact E-Mail: Wydale_Holmes@tempe.govData Source Type: CSVPreparation Method: Data received from vendor and entered in CSVPublish Frequency: BiennialPublish Method: ManualData Dictionary (update pending)
Confidence in institutions, by gender and province
www150.statcan.gc.ca
canwin-datahub.ad.umanitoba.ca
+2more
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Confidence in institutions, by gender and province [Dataset]. http://doi.org/10.25318/4510007301-eng
Explore at:
Unique identifier
https://doi.org/10.25318/4510007301-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Percentage of persons aged 15 years and over by level of confidence in selected types of institutions, by gender, for Canada, regions and provinces.
Confidence level of business or organization in its ability to make payments...
ouvert.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Feb 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2025). Confidence level of business or organization in its ability to make payments to suppliers and service providers in full and on time, first quarter of 2025 [Dataset]. https://ouvert.canada.ca/data/dataset/8be07a6b-2f19-48e5-a4ba-024e5e4933c5
Explore at:
csv, xml, htmlAvailable download formats
Dataset updated
Feb 28, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Confidence level of business or organization in its ability to make payments to suppliers and service providers in full and on time, by North American Industry Classification System (NAICS), business employment size, type of business, business activity and majority ownership, first quarter of 2025.
T
China Consumer Confidence
tradingeconomics.com
fa.tradingeconomics.com
+13more
csv, excel, json, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS, China Consumer Confidence [Dataset]. https://tradingeconomics.com/china/consumer-confidence
Explore at:
excel, xml, json, csvAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 31, 1991 - May 31, 2025
Area covered
China
Description
Consumer Confidence in China increased to 88 points in May from 87.80 points in April of 2025. This dataset provides - China Consumer Confidence - actual values, historical data, forecast, chart, statistics, economic calendar and news.
f
Performance of ML models on test data.
plos.figshare.com
xls
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Performance of ML models on test data. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002475.t005
Dataset updated
Oct 31, 2023
Dataset provided by
PLOS Global Public Health
Authors
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
W
Coastal Design Sea Levels- Coastal Flood Boundary Confidence Intervals
cloud.csiss.gmu.edu
data.europa.eu
Updated Dec 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United Kingdom (2019). Coastal Design Sea Levels- Coastal Flood Boundary Confidence Intervals [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/coastal-design-sea-levels-coastal-flood-boundary-confidence-intervals
Explore at:
Dataset updated
Dec 22, 2019
Dataset provided by
United Kingdom
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
PLEASE NOTE: The Coastal Design Sea Levels – Coastal Flood Boundary datasets currently show data published in February 2011. An update to the data is currently planned for August 2019. This statement will be updated when the data update is complete.

This metadata record is for AfA product AfA188-2. Extreme Sea Level Confidence information is part of Coastal Design/Extreme Sea Levels, a GIS dataset and supporting information providing design / extreme sea level and typical surge information around the coastline of England and Wales under present day conditions. Data for Scotland is available from the Scottish Environment Protection Agency (SEPA). This is a specialist dataset which informs on work commenced around the coast ranging from coastal flood modelling, scheme design, strategic planning and flood risk assessments. Extreme Sea Level Confidence information describes the extreme sea levels for 16 different annual probabilities of exceedance.

A bundle download of all Coastal Design Sea Levels datasets is available from this record. Please see individual records for full details and metadata on each product. Attribution statement: © Environment Agency copyright and/or database right 2015. All rights reserved.
Average Confidence Level of Heat Demand Estimates (250m Grid) - Scotland
data.europa.eu
find.data.gov.scot
+1more
tiff, unknown
Updated Oct 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scottish Government SpatialData.gov.scot (2021). Average Confidence Level of Heat Demand Estimates (250m Grid) - Scotland [Dataset]. https://data.europa.eu/data/datasets/average-confidence-level-of-heat-demand-estimates-250m-grid-scotland?locale=de
Explore at:
tiff, unknownAvailable download formats
Dataset updated
Oct 16, 2021
Dataset provided by
Scottish Governmenthttp://www.gov.scot/
Authors
Scottish Government SpatialData.gov.scot
Area covered
Scotland
Description
The Scotland Heat Map provides estimates of heat demand for all properties in Scotland. To indicate reliability, each estimate is assigned a confidence level from 1 to 5. Level 1 is least reliable and level 5 most. This is mainly determined by the presence of information that would directly impact on heat demand in the estimate’s source data. For example, estimates based on data that includes building type, age and floor area would be more reliable than estimates based solely on floor area derived from mapping data. This raster dataset gives the average (mean) confidence level of properties within 250m x 250m grid squares covering all of Scotland.

The Scotland Heat Map is a tool to help plan for the reduction of carbon emissions from heat in buildings. Average confidence level is an indicator of reliability of the heat demand estimates within an area and allows planners to decide whether they meet their needs. The map is produced by the Scottish Government and aims to provide annual updates of heat demand estimates, and therefore confidence levels. More information can be found in the documentation available on the Scottish Government website: https://www.gov.scot/publications/scotland-heat-map-documents/
Data from: Disentangling the origins of confidence in speeded perceptual...
openneuro.org
Updated Apr 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Pereira; Nathan Faivre; Inaki Iturrate; Marco Wirthlin; Luana Serafini; Stephanie Martin; Arnaud Desvachez; Olaf Blanke; Dimitri Van de Ville; Jose del R. Millan (2020). Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging [Dataset]. http://doi.org/10.18112/openneuro.ds002158.v1.0.2
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002158.v1.0.2
Dataset updated
Apr 25, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Michael Pereira; Nathan Faivre; Inaki Iturrate; Marco Wirthlin; Luana Serafini; Stephanie Martin; Arnaud Desvachez; Olaf Blanke; Dimitri Van de Ville; Jose del R. Millan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains the data in

Pereira, M., Faivre, N., Iturrate, I., Wirthlin, M., Serafini, L., Martin, S., Desvachez, A., Blanke, O., Van De Ville, D., Millan, JdR. (2020). Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging. Proceedings of the National Academy of Science, 117 (15) pp. 8382-8390 https://doi.org/10.1073/pnas.1918335117

Preprint: https://www.biorxiv.org/content/10.1101/496877v1

ABSTRACT The human capacity to compute the likelihood that a decision is correct—known as metacognition—has proven difficult to study in isolation as it usually cooccurs with decision making. Here, we isolated postdecisional from decisional contributions to metacognition by analyzing neural correlates of confidence with multimodal imaging. Healthy volunteers reported their confidence in the accuracy of decisions they made or decisions they observed. We found better metacognitive performance for committed vs. observed decisions, indicating that committing to a decision may improve confidence. Relying on concurrent electroencephalography and hemodynamic recordings, we found a common correlate of confidence following committed and observed decisions in the inferior frontal gyrus and a dissociation in the anterior prefrontal cortex and anterior insula. We discuss these results in light of decisional and postdecisional accounts of confidence and propose a computational model of confidence in which metacognitive performance naturally improves when evidence accumulation is constrained upon committing a decision.

preregistration: https://osf.io/a5qmv/

The dataset contains raw fMRI scans, raw EEG in BrainVision format as well as anatomical scans (T1) and field mapping. We also included preprocessed EEG and fMRI data in derivatives/eegprep and derivatives/fmriprep.

EEG PREPROCESSING MR-gradient artifacts were removed using sliding window average template subtraction. TP10 electrode on the right mastoid was used to detect heartbeats for ballistocardiogram artifact (BCG) removal using a semi-automatic procedure in BrainVision Analyzer 2. Data were then filtered using a Butterworth, 4th order zero-phase (two-pass) bandpass filter between 1 and 10 Hz, epoched [-0.2, 0.6 s] around the response onset (i.e. the button press in the active condition or the appearance of the virtual hand for in observation condition), re-referenced to a common average, and input to independent component analysis (ICA) to remove residual BCG and ocular artifacts. In order to ensure numerical stability when estimating the independent components, we retained 99% of the variance from the electrode space, leading to an average of 19 (SD = 6) components estimated for each participant and condition. Independent components (ICs) were then fitted with a dipolar source localization method (66). ICs whose dipole lied outside the brain, or resembled muscular or ocular artifacts were eliminated. A total of 8 (SD = 3) components were finally kept. All preprocessing steps were performed using EEGLAB and in-house scripts under Matlab (The MathWorks, Inc., Natick, Massachusetts, United States).

FMRI PREPROCESSING We modeled the BOLD signal using a general linear model (GLM) with two separate regressors (stick functions at stimulus onset) for the active and observation condition as well as their spatial and temporal derivatives. We then parametrically modulated the regressors with three behavioral variables: the confidence ratings, the response times, and the numerosity difference between the two arrays of dots (i.e., perceptual evidence). Empirical cross-correlation between regressors confirmed limited collinearity for the active (resp. observation) condition (max(abs(R)) = 0.26 ± 0.02 resp., max(abs(R)) = 0.25 ± 0.02). Bad trials as defined in the behavioral analysis section were modeled by two separate regressors (one for active and one for observation) and their spatial and temporal derivatives. We added six realignments parameters as regressors of no interest. All second-level (group-level) results are reported at a significance-level of p < 0.05 using cluster-extent family-wise error (FWE) correction with a voxel-height threshold of p < 0.001. We used the anatomical automatic labelling (AAL) atlas for brain parcellation (Tzourio-Mazoyer et al., 2002).
f
Strengths and weaknesses of different methods.
plos.figshare.com
xls
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Strengths and weaknesses of different methods. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgph.0002475.t002
Dataset updated
Oct 31, 2023
Dataset provided by
PLOS Global Public Health
Authors
Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.
Introduction to robust estimation of ERP data
figshare.com
search.datacite.org
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Rousselet (2023). Introduction to robust estimation of ERP data [Dataset]. http://doi.org/10.6084/m9.figshare.3501728.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3501728.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Guillaume Rousselet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data package contains the slides, Matlab m files and a dataset for an ERP workshop I gave in Washington DC, Glasgow, Fribourg, Frankfurt & Berlin. The goal of the workshop is to use hands-on exercises to introduce the basic principles and the Matlab implementation of robust estimation, using resampling methods (bootstrap & permutation) in conjunction with robust estimators. The workshop covers why classic t-tests and ANOVAs on means are not necessarily the best options, and how robust approaches can help. In particular, it demonstrates techniques to compare entire distributions, how to build confidence intervals about any quantity using the bootstrap, and how to effectively control for multiple comparisons. The methods are applied to single-subject and group analyses, and examples are provided to integrate both levels into informative figures.
Z
Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...
data.niaid.nih.gov
live.european-language-grid.eu
+2more
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gayo-Avello, Daniel (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3833781
Explore at:
Dataset updated
May 20, 2020
Dataset authored and provided by
Gayo-Avello, Daniel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

If you need to contact me you can find me as @PFCdgayo in Twitter.
f
Data from: [Dataset:] Templates for Statistical Resample Methods Maximize...
smithsonian.figshare.com
pdf
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ethan A. Perets; Asuncion Charola; Yun Liu; Carol A. Grissom; Paula T. DePriest; Robert J. Koestler (2024). [Dataset:] Templates for Statistical Resample Methods Maximize Accuracy and Efficiency of Colorimetric Data Collection for Monitoring Biocolonization on Stone [Dataset]. http://doi.org/10.5479/data.mci.2016.0629
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.5479/data.mci.2016.0629
Dataset updated
Apr 23, 2024
Dataset provided by
Museum Conservation Institute
Authors
Ethan A. Perets; Asuncion Charola; Yun Liu; Carol A. Grissom; Paula T. DePriest; Robert J. Koestler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Non-parametric and semi-parametric statistical approaches were developed to maximize accuracy of colorimetric data for monitoring biocolonization on stone surfaces, while simultaneously optimizing efficiency of data collection in the field. These approaches were applied to colorimetric data sets collected on three Kasota limestone capstones located at the National Museum of the American Indian in Washington, DC. Data was randomly resampled without replacement (the statistical "jackknife"), producing data subsets of diminishing resample sizes. This study assesses the extent to which reduced numbers of colorimetric measurements diverge from characterizations provided with greater numbers of measurements on the same stone surface. Resample subsets were compared against parent colorimetric data sets from each stone in terms of standardized effect size (Cohen's d). A universal lowest limit on the number of colorimetric measurements necessary to accurately represent a stone surface cannot be established, as this will depend on properties intrinsic to the stone. However, it is shown that for levels of d = ¬±0.2, ¬±0.5, ¬±0.8 at 95 percent confidence, an homogeneously biocolonized dolomitic limestone requires approximately 35, 10, and 5 colorimetric measurements, respectively. By comparison, hypothesis testing following the Student‚Äôs t-test at d = ¬±0.2, ¬±0.5, ¬±0.8 showed that approximately 55, 20, and 10 measurements were required. Factors affecting the necessary minimum sample size for achieving pre-selected confidence levels and acceptable measurement error -- including the impacts of a biocide treatment and heterogeneity of surface textures -- were also investigated. Comparison of results for textured capstones suggests that rougher stones require greater numbers of measurements at identical d and confidence. Corresponding author: Paula DePriest.
o
The CloudSat-CALIPSO Cloud Amount Uncertainty product
explore.openaire.eu
data.niaid.nih.gov
Updated Feb 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Z. Kotarba (2022). The CloudSat-CALIPSO Cloud Amount Uncertainty product [Dataset]. http://doi.org/10.5281/zenodo.6113204
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6113204
Dataset updated
Feb 15, 2022
Authors
Andrzej Z. Kotarba
Description
Version 1.0 of the dataset. The peer-reviewed publication for this dataset has been published in Remote Sensing, 2021, 13(4), 807, and can be accessed here https://doi.org/10.3390/rs13040807. Please cite this when using the dataset. The ‘CloudSat-CALIPSO Cloud Amount Uncertainty’ product provides information about mean annual and mean monthly cloud amount, at 40 vertical levels (480 m), and four spatial resolutions (1°, 2.5°, 5°, and 10°), as derived from the joint CloudSat-CALIPSO lidar-radar observations (2006–2011). For the very first time, the CloudSat-CALIPSO climatology comes with a quantitative uncertainty assessment - bootstrapped confidence intervals for mean values. The width of confidence intervals is an essential element in studying spatial and/or temporal variation in cloud amount with satellite profiling instruments. Uncertainty data are provided at four confidence levels (85%, 90%, 95%, 99%). Data products are distributed as HDF4 files, and can be accessed under the CC BY 4.0 license at doi:10.5281/zenodo.6113205. See the Product documentation file ( LIRAC.conf.v01_doc01.pdf ) for details. {"references": ["Kotarba, A.Z., Solecki, M. (2021) Uncertainty Assessment of the Vertically-Resolved Cloud Amount for Joint CloudSat-CALIPSO Radar-Lidar Observations. Remote Sensing, 13, 807, doi:10.3390/rs13040807."]}
P
TCP-CI Dataset
paperswithcode.com
Updated Sep 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). TCP-CI Dataset [Dataset]. https://paperswithcode.com/dataset/tcp-ci
Explore at:
Dataset updated
Sep 29, 2021
Description
This dataset is a benchmark of 25 open-source subjects with 21.5k builds and 3.6k failed builds that enables a fair comparison and evaluation of Test Case Prioritization (TCP) techniques. We made our data collection tools available, which can be used to extend and update the subjects. The description of the structure and files of the dataset can be also found in the documentation of the data collection tool.
d
Public Health Statistics - Screening for elevated blood lead levels in...
catalog.data.gov
data.cityofchicago.org
Updated Feb 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofchicago.org (2022). Public Health Statistics - Screening for elevated blood lead levels in children aged 0-6 years by year, Chicago, 1999-2013 - Historical [Dataset]. https://catalog.data.gov/dataset/public-health-statistics-screening-for-elevated-blood-lead-levels-in-children-aged-0-1999-
Explore at:
Dataset updated
Feb 7, 2022
Dataset provided by
data.cityofchicago.org
Area covered
Chicago
Description
Note: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org. This dataset contains the annual number and estimated rate per 1,000 children aged 0-6 years receiving a blood lead level test, and the annual number and estimated percentage of those tested found to have an elevated blood lead level, with corresponding 95% confidence intervals, by Chicago community area, for the years 1999 – 2013. See the full dataset description for more information at https://data.cityofchicago.org/api/views/gpjh-i4j2/files/vIHuTqqgxDT1UFX9XhgCeYddaOhsG2nzgoMLUoRjeOI?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\LEAD_POISONING\Dataset_Description_BloodLeadTesting_1999-2013.pdf
T
Albania Consumer Confidence
tradingeconomics.com
pl.tradingeconomics.com
+13more
csv, excel, json, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS, Albania Consumer Confidence [Dataset]. https://tradingeconomics.com/albania/consumer-confidence
Explore at:
excel, json, xml, csvAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 30, 2003 - Jun 30, 2025
Area covered
Albania
Description
Consumer Confidence in Albania increased to -24 points in June from -24.20 points in May of 2025. This dataset provides - Albania Consumer Confidence - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Data from: GHALogs: Large-Scale Dataset of GitHub Actions Runs
zenodo.org
application/gzip, zip
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon; Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon (2024). GHALogs: Large-Scale Dataset of GitHub Actions Runs [Dataset]. http://doi.org/10.5281/zenodo.10154920
Explore at:
application/gzip, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10154920
Dataset updated
Dec 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon; Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
Oct 2023
Description
In recent years, continuous integration and deployment (CI/CD) has become increasingly popular in both the open-source community and industry. Evaluating CI/CD performance is a critical aspect of software development, as it not only helps minimize execution costs but also ensures faster feedback for developers. Despite its importance, there is limited fine-grained knowledge about the performance of CI/CD processes—knowledge that is essential for identifying bottlenecks and optimization opportunities.
Moreover, the availability of large-scale, publicly accessible datasets of CI/CD logs remains scarce. The few datasets that do exist are often outdated and lack comprehensive coverage. To address this gap, we introduce a new dataset comprising 116k CI/CD workflows executed using GitHub Actions (GHA) across 25k public code projects spanning 20 different programming languages.
This dataset includes 513k workflow runs encompassing 2.3 million individual steps. For each workflow run, we provide detailed metadata along with complete run logs. To the best of our knowledge, this is the largest dataset of CI/CD runs that includes full log data. The inclusion of these logs enables more in-depth analysis of CI/CD pipelines, offering insights that cannot be gleaned solely from code repositories.
We postulate that this dataset will facilitate future CI/CD pipeline behavior research through log-based analysis. Potential applications include performance evaluation (e.g., measuring task execution times) and root cause analysis (e.g., identifying reasons for pipeline failures).

Facebook

Twitter

Click to copy link

Link copied

Cite

Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2

The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

txtAvailable download formats

Unique identifier

https://doi.org/10.26180/25286407.v2

Dataset updated

Oct 15, 2024

Dataset provided by

Monash University

Authors

Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

Clear search

Close search

Google apps

Main menu

The banksia plot: a method for visually comparing point estimates and...

Estimating Confidence Intervals for 2020 Census Statistics Using Approximate...

Population with confidence in EU institutions by institution

1.11 Feeling Safe in Work (summary)

Confidence in institutions, by gender and province

Confidence level of business or organization in its ability to make payments...

China Consumer Confidence

Performance of ML models on test data.

Coastal Design Sea Levels- Coastal Flood Boundary Confidence Intervals

Average Confidence Level of Heat Demand Estimates (250m Grid) - Scotland

Data from: Disentangling the origins of confidence in speeded perceptual...

Strengths and weaknesses of different methods.

Introduction to robust estimation of ERP data

Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

Data from: [Dataset:] Templates for Statistical Resample Methods Maximize...

The CloudSat-CALIPSO Cloud Amount Uncertainty product

TCP-CI Dataset

Public Health Statistics - Screening for elevated blood lead levels in...

Albania Consumer Confidence

Data from: GHALogs: Large-Scale Dataset of GitHub Actions Runs

The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets