100+ datasets found
  1. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  2. Estimating Confidence Intervals for 2020 Census Statistics Using Approximate...

    • registry.opendata.aws
    Updated Aug 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau (2024). Estimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2010 Census Proof of Concept) [Dataset]. https://registry.opendata.aws/census-2010-amc-mdf-replicates/
    Explore at:
    Dataset updated
    Aug 5, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2, ..., PPMF25) are a set of microdata files intended for use in estimating the magnitude of error(s) introduced by the 2020 Decennial Census Disclosure Avoidance System (DAS) into the Redistricting and DHC products. The PPMF0 was created by executing the 2020 DAS TopDown Algorithm (TDA) using the confidential 2010 Census Edited File (CEF) as the initial input; the replicates were then created by executing the 2020 DAS TDA repeatedly with the PPMF0 as its initial input. Inspired by analogy to the use of bootstrap methods in non-private contexts, U.S. Census Bureau (USCB) researchers explored whether simple calculations based on comparing each PPMFi to the PPMF0 could be used to reliably estimate the scale of errors introduced by the 2020 DAS, and generally found this approach worked well.

    The PPMF0 and PPMFi files contained here are provided so that external researchers can estimate properties of DAS-introduced error without privileged access to internal USCB-curated data sets; further information on the estimation methodology can be found in Ashmead et. al 2024.

    The 2010 DHC AMC seed PPMF0 and PPMF replicates have been cleared for public dissemination by the USCB Disclosure Review Board (CBDRB-FY24-DSEP-0002). The 2010 PPMF0 included in these files was produced using the same parameters and settings as were used to produce the 2010 Demonstration Data Product Suite (2023-04-03) PPMF, but represents an independent execution of the TopDown Algorithm. The PPMF0 and PPMF replicates contain all Person and Units attributes necessary to produce the Redistricting and DHC publications for both the United States and Puerto Rico, and include geographic detail down to the Census Block level. They do not include attributes specific to either the Detailed DHC-A or Detailed DHC-B products; in particular, data on Major Race (e.g., White Alone) is included, but data on Detailed Race (e.g., Cambodian) is not included in the PPMF0 and replicates.

    The 2020 AMC replicate files for estimating confidence intervals for the official 2020 Census statistics are available.

  3. Population with confidence in EU institutions by institution

    • data.europa.eu
    • gimi9.com
    csv, html, tsv, xml
    Updated Nov 6, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eurostat (2017). Population with confidence in EU institutions by institution [Dataset]. https://data.europa.eu/data/datasets/agvl4w4bhtllpvo3givrw?locale=en
    Explore at:
    csv(9631), html, tsv, xml(7101), xml(8809)Available download formats
    Dataset updated
    Nov 6, 2017
    Dataset authored and provided by
    Eurostathttps://ec.europa.eu/eurostat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    European Union
    Description

    The indicator measures confidence among EU citizens in a selection of EU institutions: the European Parliament, the European Commission, and the European Central Bank. It is expressed as the share of positive opinions (people who declare that they tend to trust) about the institutions. Citizens are asked to express their confidence levels by choosing the following alternatives: ‘tend to trust’, ‘tend not to trust’ and ‘don’t know’ or ‘no answer’. The indicator is based on the Eurobarometer, a survey which has been conducted twice a year since 1973 to monitor the evolution of public opinion in the Member States.

  4. d

    Public Health Statistics - Screening for elevated blood lead levels in...

    • catalog.data.gov
    • data.cityofchicago.org
    Updated Feb 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofchicago.org (2022). Public Health Statistics - Screening for elevated blood lead levels in children aged 0-6 years by year, Chicago, 1999-2013 - Historical [Dataset]. https://catalog.data.gov/dataset/public-health-statistics-screening-for-elevated-blood-lead-levels-in-children-aged-0-1999-
    Explore at:
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    data.cityofchicago.org
    Area covered
    Chicago
    Description

    Note: This dataset is historical only and there are not corresponding datasets for more recent time periods. For that more-recent information, please visit the Chicago Health Atlas at https://chicagohealthatlas.org. This dataset contains the annual number and estimated rate per 1,000 children aged 0-6 years receiving a blood lead level test, and the annual number and estimated percentage of those tested found to have an elevated blood lead level, with corresponding 95% confidence intervals, by Chicago community area, for the years 1999 – 2013. See the full dataset description for more information at https://data.cityofchicago.org/api/views/gpjh-i4j2/files/vIHuTqqgxDT1UFX9XhgCeYddaOhsG2nzgoMLUoRjeOI?download=true&filename=P:\EPI\OEPHI\MATERIALS\REFERENCES\LEAD_POISONING\Dataset_Description_BloodLeadTesting_1999-2013.pdf

  5. Confidence in institutions, by gender and province

    • www150.statcan.gc.ca
    • canwin-datahub.ad.umanitoba.ca
    • +2more
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Canada, Statistics Canada (2025). Confidence in institutions, by gender and province [Dataset]. http://doi.org/10.25318/4510007301-eng
    Explore at:
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    Area covered
    Canada
    Description

    Percentage of persons aged 15 years and over by level of confidence in selected types of institutions, by gender, for Canada, regions and provinces.

  6. Confidence level of business or organization in its ability to make payments...

    • ouvert.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Feb 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2025). Confidence level of business or organization in its ability to make payments to suppliers and service providers in full and on time, first quarter of 2025 [Dataset]. https://ouvert.canada.ca/data/dataset/8be07a6b-2f19-48e5-a4ba-024e5e4933c5
    Explore at:
    csv, xml, htmlAvailable download formats
    Dataset updated
    Feb 28, 2025
    Dataset provided by
    Statistics Canadahttps://statcan.gc.ca/en
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Confidence level of business or organization in its ability to make payments to suppliers and service providers in full and on time, by North American Industry Classification System (NAICS), business employment size, type of business, business activity and majority ownership, first quarter of 2025.

  7. T

    China Consumer Confidence

    • tradingeconomics.com
    • fa.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, China Consumer Confidence [Dataset]. https://tradingeconomics.com/china/consumer-confidence
    Explore at:
    excel, xml, json, csvAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 31, 1991 - May 31, 2025
    Area covered
    China
    Description

    Consumer Confidence in China increased to 88 points in May from 87.80 points in April of 2025. This dataset provides - China Consumer Confidence - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  8. f

    Performance of ML models on test data.

    • plos.figshare.com
    xls
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha (2023). Performance of ML models on test data. [Dataset]. http://doi.org/10.1371/journal.pgph.0002475.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Mrinal Saha; Aparna Deb; Imtiaz Sultan; Sujat Paul; Jishan Ahmed; Goutam Saha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vitamin D insufficiency appears to be prevalent in SLE patients. Multiple factors potentially contribute to lower vitamin D levels, including limited sun exposure, the use of sunscreen, darker skin complexion, aging, obesity, specific medical conditions, and certain medications. The study aims to assess the risk factors associated with low vitamin D levels in SLE patients in the southern part of Bangladesh, a region noted for a high prevalence of SLE. The research additionally investigates the possible correlation between vitamin D and the SLEDAI score, seeking to understand the potential benefits of vitamin D in enhancing disease outcomes for SLE patients. The study incorporates a dataset consisting of 50 patients from the southern part of Bangladesh and evaluates their clinical and demographic data. An initial exploratory data analysis is conducted to gain insights into the data, which includes calculating means and standard deviations, performing correlation analysis, and generating heat maps. Relevant inferential statistical tests, such as the Student’s t-test, are also employed. In the machine learning part of the analysis, this study utilizes supervised learning algorithms, specifically Linear Regression (LR) and Random Forest (RF). To optimize the hyperparameters of the RF model and mitigate the risk of overfitting given the small dataset, a 3-Fold cross-validation strategy is implemented. The study also calculates bootstrapped confidence intervals to provide robust uncertainty estimates and further validate the approach. A comprehensive feature importance analysis is carried out using RF feature importance, permutation-based feature importance, and SHAP values. The LR model yields an RMSE of 4.83 (CI: 2.70, 6.76) and MAE of 3.86 (CI: 2.06, 5.86), whereas the RF model achieves better results, with an RMSE of 2.98 (CI: 2.16, 3.76) and MAE of 2.68 (CI: 1.83,3.52). Both models identify Hb, CRP, ESR, and age as significant contributors to vitamin D level predictions. Despite the lack of a significant association between SLEDAI and vitamin D in the statistical analysis, the machine learning models suggest a potential nonlinear dependency of vitamin D on SLEDAI. These findings highlight the importance of these factors in managing vitamin D levels in SLE patients. The study concludes that there is a high prevalence of vitamin D insufficiency in SLE patients. Although a direct linear correlation between the SLEDAI score and vitamin D levels is not observed, machine learning models suggest the possibility of a nonlinear relationship. Furthermore, factors such as Hb, CRP, ESR, and age are identified as more significant in predicting vitamin D levels. Thus, the study suggests that monitoring these factors may be advantageous in managing vitamin D levels in SLE patients. Given the immunological nature of SLE, the potential role of vitamin D in SLE disease activity could be substantial. Therefore, it underscores the need for further large-scale studies to corroborate this hypothesis.

  9. W

    Coastal Design Sea Levels- Coastal Flood Boundary Confidence Intervals

    • cloud.csiss.gmu.edu
    • data.europa.eu
    Updated Dec 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Kingdom (2019). Coastal Design Sea Levels- Coastal Flood Boundary Confidence Intervals [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/coastal-design-sea-levels-coastal-flood-boundary-confidence-intervals
    Explore at:
    Dataset updated
    Dec 22, 2019
    Dataset provided by
    United Kingdom
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    PLEASE NOTE: The Coastal Design Sea Levels – Coastal Flood Boundary datasets currently show data published in February 2011. An update to the data is currently planned for August 2019. This statement will be updated when the data update is complete.

    This metadata record is for AfA product AfA188-2. Extreme Sea Level Confidence information is part of Coastal Design/Extreme Sea Levels, a GIS dataset and supporting information providing design / extreme sea level and typical surge information around the coastline of England and Wales under present day conditions. Data for Scotland is available from the Scottish Environment Protection Agency (SEPA). This is a specialist dataset which informs on work commenced around the coast ranging from coastal flood modelling, scheme design, strategic planning and flood risk assessments. Extreme Sea Level Confidence information describes the extreme sea levels for 16 different annual probabilities of exceedance.

    A bundle download of all Coastal Design Sea Levels datasets is available from this record. Please see individual records for full details and metadata on each product. Attribution statement: © Environment Agency copyright and/or database right 2015. All rights reserved.

  10. Average Confidence Level of Heat Demand Estimates (250m Grid) - Scotland

    • data.europa.eu
    • find.data.gov.scot
    • +1more
    tiff, unknown
    Updated Oct 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scottish Government SpatialData.gov.scot (2021). Average Confidence Level of Heat Demand Estimates (250m Grid) - Scotland [Dataset]. https://data.europa.eu/data/datasets/average-confidence-level-of-heat-demand-estimates-250m-grid-scotland?locale=de
    Explore at:
    tiff, unknownAvailable download formats
    Dataset updated
    Oct 16, 2021
    Dataset provided by
    Scottish Governmenthttp://www.gov.scot/
    Authors
    Scottish Government SpatialData.gov.scot
    Area covered
    Scotland
    Description

    The Scotland Heat Map provides estimates of heat demand for all properties in Scotland. To indicate reliability, each estimate is assigned a confidence level from 1 to 5. Level 1 is least reliable and level 5 most. This is mainly determined by the presence of information that would directly impact on heat demand in the estimate’s source data. For example, estimates based on data that includes building type, age and floor area would be more reliable than estimates based solely on floor area derived from mapping data. This raster dataset gives the average (mean) confidence level of properties within 250m x 250m grid squares covering all of Scotland.

    The Scotland Heat Map is a tool to help plan for the reduction of carbon emissions from heat in buildings. Average confidence level is an indicator of reliability of the heat demand estimates within an area and allows planners to decide whether they meet their needs. The map is produced by the Scottish Government and aims to provide annual updates of heat demand estimates, and therefore confidence levels. More information can be found in the documentation available on the Scottish Government website: https://www.gov.scot/publications/scotland-heat-map-documents/

  11. Data from: Disentangling the origins of confidence in speeded perceptual...

    • openneuro.org
    Updated Apr 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Pereira; Nathan Faivre; Inaki Iturrate; Marco Wirthlin; Luana Serafini; Stephanie Martin; Arnaud Desvachez; Olaf Blanke; Dimitri Van de Ville; Jose del R. Millan (2020). Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging [Dataset]. http://doi.org/10.18112/openneuro.ds002158.v1.0.2
    Explore at:
    Dataset updated
    Apr 25, 2020
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Michael Pereira; Nathan Faivre; Inaki Iturrate; Marco Wirthlin; Luana Serafini; Stephanie Martin; Arnaud Desvachez; Olaf Blanke; Dimitri Van de Ville; Jose del R. Millan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the data in

    Pereira, M., Faivre, N., Iturrate, I., Wirthlin, M., Serafini, L., Martin, S., Desvachez, A., Blanke, O., Van De Ville, D., Millan, JdR. (2020). Disentangling the origins of confidence in speeded perceptual judgments through multimodal imaging. Proceedings of the National Academy of Science, 117 (15) pp. 8382-8390 https://doi.org/10.1073/pnas.1918335117

    Preprint: https://www.biorxiv.org/content/10.1101/496877v1

    ABSTRACT The human capacity to compute the likelihood that a decision is correct—known as metacognition—has proven difficult to study in isolation as it usually cooccurs with decision making. Here, we isolated postdecisional from decisional contributions to metacognition by analyzing neural correlates of confidence with multimodal imaging. Healthy volunteers reported their confidence in the accuracy of decisions they made or decisions they observed. We found better metacognitive performance for committed vs. observed decisions, indicating that committing to a decision may improve confidence. Relying on concurrent electroencephalography and hemodynamic recordings, we found a common correlate of confidence following committed and observed decisions in the inferior frontal gyrus and a dissociation in the anterior prefrontal cortex and anterior insula. We discuss these results in light of decisional and postdecisional accounts of confidence and propose a computational model of confidence in which metacognitive performance naturally improves when evidence accumulation is constrained upon committing a decision.

    preregistration: https://osf.io/a5qmv/

    The dataset contains raw fMRI scans, raw EEG in BrainVision format as well as anatomical scans (T1) and field mapping. We also included preprocessed EEG and fMRI data in derivatives/eegprep and derivatives/fmriprep.

    EEG PREPROCESSING MR-gradient artifacts were removed using sliding window average template subtraction. TP10 electrode on the right mastoid was used to detect heartbeats for ballistocardiogram artifact (BCG) removal using a semi-automatic procedure in BrainVision Analyzer 2. Data were then filtered using a Butterworth, 4th order zero-phase (two-pass) bandpass filter between 1 and 10 Hz, epoched [-0.2, 0.6 s] around the response onset (i.e. the button press in the active condition or the appearance of the virtual hand for in observation condition), re-referenced to a common average, and input to independent component analysis (ICA) to remove residual BCG and ocular artifacts. In order to ensure numerical stability when estimating the independent components, we retained 99% of the variance from the electrode space, leading to an average of 19 (SD = 6) components estimated for each participant and condition. Independent components (ICs) were then fitted with a dipolar source localization method (66). ICs whose dipole lied outside the brain, or resembled muscular or ocular artifacts were eliminated. A total of 8 (SD = 3) components were finally kept. All preprocessing steps were performed using EEGLAB and in-house scripts under Matlab (The MathWorks, Inc., Natick, Massachusetts, United States).

    FMRI PREPROCESSING We modeled the BOLD signal using a general linear model (GLM) with two separate regressors (stick functions at stimulus onset) for the active and observation condition as well as their spatial and temporal derivatives. We then parametrically modulated the regressors with three behavioral variables: the confidence ratings, the response times, and the numerosity difference between the two arrays of dots (i.e., perceptual evidence). Empirical cross-correlation between regressors confirmed limited collinearity for the active (resp. observation) condition (max(abs(R)) = 0.26 ± 0.02 resp., max(abs(R)) = 0.25 ± 0.02). Bad trials as defined in the behavioral analysis section were modeled by two separate regressors (one for active and one for observation) and their spatial and temporal derivatives. We added six realignments parameters as regressors of no interest. All second-level (group-level) results are reported at a significance-level of p < 0.05 using cluster-extent family-wise error (FWE) correction with a voxel-height threshold of p < 0.001. We used the anatomical automatic labelling (AAL) atlas for brain parcellation (Tzourio-Mazoyer et al., 2002).

  12. Introduction to robust estimation of ERP data

    • figshare.com
    • search.datacite.org
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Rousselet (2023). Introduction to robust estimation of ERP data [Dataset]. http://doi.org/10.6084/m9.figshare.3501728.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Guillaume Rousselet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data package contains the slides, Matlab m files and a dataset for an ERP workshop I gave in Washington DC, Glasgow, Fribourg, Frankfurt & Berlin. The goal of the workshop is to use hands-on exercises to introduce the basic principles and the Matlab implementation of robust estimation, using resampling methods (bootstrap & permutation) in conjunction with robust estimators. The workshop covers why classic t-tests and ANOVAs on means are not necessarily the best options, and how robust approaches can help. In particular, it demonstrates techniques to compare entire distributions, how to build confidence intervals about any quantity using the bootstrap, and how to effectively control for multiple comparisons. The methods are applied to single-subject and group analyses, and examples are provided to integrate both levels into informative figures.

  13. Z

    Data from: Twitter historical dataset: March 21, 2006 (first tweet) to July...

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +2more
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gayo-Avello, Daniel (2020). Twitter historical dataset: March 21, 2006 (first tweet) to July 31, 2009 (3 years, 1.5 billion tweets) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3833781
    Explore at:
    Dataset updated
    May 20, 2020
    Dataset authored and provided by
    Gayo-Avello, Daniel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Disclaimer: This dataset is distributed by Daniel Gayo-Avello, an associate professor at the Department of Computer Science in the University of Oviedo, for the sole purpose of non-commercial research and it just includes tweet ids.

    The dataset contains tweet IDs for all the published tweets (in any language) bettween March 21, 2006 and July 31, 2009 thus comprising the first whole three years of Twitter from its creation, that is, about 1.5 billion tweets (see file Twitter-historical-20060321-20090731.zip).

    It covers several defining issues in Twitter, such as the invention of hashtags, retweets and trending topics, and it includes tweets related to the 2008 US Presidential Elections, the first Obama’s inauguration speech or the 2009 Iran Election protests (one of the so-called Twitter Revolutions).

    Finally, it does contain tweets in many major languages (mainly English, Portuguese, Japanese, Spanish, German and French) so it should be possible–at least in theory–to analyze international events from different cultural perspectives.

    The dataset was completed in November 2016 and, therefore, the tweet IDs it contains were publicly available at that moment. This means that there could be tweets public during that period that do not appear in the dataset and also that a substantial part of tweets in the dataset has been deleted (or locked) since 2016.

    To make easier to understand the decay of tweet IDs in the dataset a number of representative samples (99% confidence level and 0.5 confidence interval) are provided.

    In general terms, 85.5% ±0.5 of the historical tweets are available as of May 19, 2020 (see file Twitter-historical-20060321-20090731-sample.txt). However, since the amount of tweets vary greatly throughout the period of three years covered in the dataset, additional representative samples are provided for 90-day intervals (see the file 90-day-samples.zip).

    In that regard, the ratio of publicly available tweets (as of May 19, 2020) is as follows:

    March 21, 2006 to June 18, 2006: 88.4% ±0.5 (from 5,512 tweets).

    June 18, 2006 to September 16, 2006: 82.7% ±0.5 (from 14,820 tweets).

    September 16, 2006 to December 15, 2006: 85.7% ±0.5 (from 107,975 tweets).

    December 15, 2006 to March 15, 2007: 88.2% ±0.5 (from 852,463 tweets).

    March 15, 2007 to June 13, 2007: 89.6% ±0.5 (from 6,341,665 tweets).

    June 13, 2007 to September 11, 2007: 88.6% ±0.5 (from 11,171,090 tweets).

    September 11, 2007 to December 10, 2007: 87.9% ±0.5 (from 15,545,532 tweets).

    December 10, 2007 to March 9, 2008: 89.0% ±0.5 (from 23,164,663 tweets).

    March 9, 2008 to June 7, 2008: 66.5% ±0.5 (from 56,416,772 tweets; see below for more details on this).

    June 7, 2008 to September 5, 2008: 78.3% ±0.5 (from 62,868,189 tweets; see below for more details on this).

    September 5, 2008 to December 4, 2008: 87.3% ±0.5 (from 89,947,498 tweets).

    December 4, 2008 to March 4, 2009: 86.9% ±0.5 (from 169,762,425 tweets).

    March 4, 2009 to June 2, 2009: 86.4% ±0.5 (from 474,581,170 tweets).

    June 2, 2009 to July 31, 2009: 85.7% ±0.5 (from 589,116,341 tweets).

    The apparent drop in available tweets from March 9, 2008 to September 5, 2008 has an easy, although embarrassing, explanation.

    At the moment of cleaning all the data to publish this dataset there seemed to be a gap between April 1, 2008 to July 7, 2008 (actually, the data was not missing but in a different backup). Since tweet IDs are easy to regenerate for that Twitter era (source code is provided in generate-ids.m) I simply produced all those that were created between those two dates. All those tweets actually existed but a number of them were obviously private and not crawlable. For those regenerated IDs the actual ratio of public tweets (as of May 19, 2020) is 62.3% ±0.5.

    In other words, what you see in that period (April to July, 2008) is not actually a huge number of tweets having been deleted but the combination of deleted and non-public tweets (whose IDs should not be in the dataset for performance purposes when rehydrating the dataset).

    Additionally, given that not everybody will need the whole period of time the earliest tweet ID for each date is provided in the file date-tweet-id.tsv.

    For additional details regarding this dataset please see: Gayo-Avello, Daniel. "How I Stopped Worrying about the Twitter Archive at the Library of Congress and Learned to Build a Little One for Myself." arXiv preprint arXiv:1611.08144 (2016).

    If you use this dataset in any way please cite that preprint (in addition to the dataset itself).

    If you need to contact me you can find me as @PFCdgayo in Twitter.

  14. o

    The CloudSat-CALIPSO Cloud Amount Uncertainty product

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrzej Z. Kotarba (2022). The CloudSat-CALIPSO Cloud Amount Uncertainty product [Dataset]. http://doi.org/10.5281/zenodo.6113204
    Explore at:
    Dataset updated
    Feb 15, 2022
    Authors
    Andrzej Z. Kotarba
    Description

    Version 1.0 of the dataset. The peer-reviewed publication for this dataset has been published in Remote Sensing, 2021, 13(4), 807, and can be accessed here https://doi.org/10.3390/rs13040807. Please cite this when using the dataset. The ‘CloudSat-CALIPSO Cloud Amount Uncertainty’ product provides information about mean annual and mean monthly cloud amount, at 40 vertical levels (480 m), and four spatial resolutions (1°, 2.5°, 5°, and 10°), as derived from the joint CloudSat-CALIPSO lidar-radar observations (2006–2011). For the very first time, the CloudSat-CALIPSO climatology comes with a quantitative uncertainty assessment - bootstrapped confidence intervals for mean values. The width of confidence intervals is an essential element in studying spatial and/or temporal variation in cloud amount with satellite profiling instruments. Uncertainty data are provided at four confidence levels (85%, 90%, 95%, 99%). Data products are distributed as HDF4 files, and can be accessed under the CC BY 4.0 license at doi:10.5281/zenodo.6113205. See the Product documentation file ( LIRAC.conf.v01_doc01.pdf ) for details. {"references": ["Kotarba, A.Z., Solecki, M. (2021) Uncertainty Assessment of the Vertically-Resolved Cloud Amount for Joint CloudSat-CALIPSO Radar-Lidar Observations. Remote Sensing, 13, 807, doi:10.3390/rs13040807."]}

  15. P

    TCP-CI Dataset

    • paperswithcode.com
    Updated Sep 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). TCP-CI Dataset [Dataset]. https://paperswithcode.com/dataset/tcp-ci
    Explore at:
    Dataset updated
    Sep 29, 2021
    Description

    This dataset is a benchmark of 25 open-source subjects with 21.5k builds and 3.6k failed builds that enables a fair comparison and evaluation of Test Case Prioritization (TCP) techniques. We made our data collection tools available, which can be used to extend and update the subjects. The description of the structure and files of the dataset can be also found in the documentation of the data collection tool.

  16. d

    1.11 Feeling Safe in Work (summary)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +8more
    Updated Jul 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). 1.11 Feeling Safe in Work (summary) [Dataset]. https://catalog.data.gov/dataset/1-11-feeling-safe-in-work-summary-b5f31
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset comes from the biennial City of Tempe Employee Survey question about feeling safe in the physical work environment (building). The Employee Survey question relating to this performance measure: “Please rate your level of agreement: My physical work environment (building) is safe, clean & maintained in good operating order.” Survey respondents are asked to rate their agreement level on a scale of 5 to 1, where 5 means “Strongly Agree” and 1 means “Strongly Disagree” (without “don’t know” responses included).The survey was voluntary, and employees were allowed to complete the survey during work hours or at home. The survey allowed employees to respond anonymously and has a 95% confidence level. This page provides data about the Feeling Safe in City Facilities performance measure. The performance measure dashboard is available at 1.11 Feeling Safe in City FacilitiesAdditional InformationSource: Employee SurveyContact: Wydale HolmesContact E-Mail: Wydale_Holmes@tempe.govData Source Type: CSVPreparation Method: Data received from vendor and entered in CSVPublish Frequency: BiennialPublish Method: ManualData Dictionary (update pending)

  17. T

    Albania Consumer Confidence

    • tradingeconomics.com
    • pl.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, Albania Consumer Confidence [Dataset]. https://tradingeconomics.com/albania/consumer-confidence
    Explore at:
    excel, json, xml, csvAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 30, 2003 - Jun 30, 2025
    Area covered
    Albania
    Description

    Consumer Confidence in Albania increased to -24 points in June from -24.20 points in May of 2025. This dataset provides - Albania Consumer Confidence - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  18. Data from: GHALogs: Large-Scale Dataset of GitHub Actions Runs

    • zenodo.org
    application/gzip, zip
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon; Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon (2024). GHALogs: Large-Scale Dataset of GitHub Actions Runs [Dataset]. http://doi.org/10.5281/zenodo.10154920
    Explore at:
    application/gzip, zipAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon; Florent Moriconi; Thomas Durieux; Jean-Rémy Falleri; Raphael Troncy; Aurélien Francillon
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Oct 2023
    Description

    In recent years, continuous integration and deployment (CI/CD) has become increasingly popular in both the open-source community and industry. Evaluating CI/CD performance is a critical aspect of software development, as it not only helps minimize execution costs but also ensures faster feedback for developers. Despite its importance, there is limited fine-grained knowledge about the performance of CI/CD processes—knowledge that is essential for identifying bottlenecks and optimization opportunities.
    Moreover, the availability of large-scale, publicly accessible datasets of CI/CD logs remains scarce. The few datasets that do exist are often outdated and lack comprehensive coverage. To address this gap, we introduce a new dataset comprising 116k CI/CD workflows executed using GitHub Actions (GHA) across 25k public code projects spanning 20 different programming languages.
    This dataset includes 513k workflow runs encompassing 2.3 million individual steps. For each workflow run, we provide detailed metadata along with complete run logs. To the best of our knowledge, this is the largest dataset of CI/CD runs that includes full log data. The inclusion of these logs enables more in-depth analysis of CI/CD pipelines, offering insights that cannot be gleaned solely from code repositories.
    We postulate that this dataset will facilitate future CI/CD pipeline behavior research through log-based analysis. Potential applications include performance evaluation (e.g., measuring task execution times) and root cause analysis (e.g., identifying reasons for pipeline failures).

  19. T

    United States Michigan Consumer Sentiment

    • tradingeconomics.com
    • es.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Jul 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). United States Michigan Consumer Sentiment [Dataset]. https://tradingeconomics.com/united-states/consumer-confidence
    Explore at:
    csv, xml, json, excelAvailable download formats
    Dataset updated
    Jul 18, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 30, 1952 - Jul 31, 2025
    Area covered
    United States
    Description

    Consumer Confidence in the United States increased to 61.80 points in July from 60.70 points in June of 2025. This dataset provides the latest reported value for - United States Consumer Sentiment - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

  20. t

    [DISCONTINUED] Level of citizens' confidence in EU institutions

    • service.tib.eu
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). [DISCONTINUED] Level of citizens' confidence in EU institutions [Dataset]. https://service.tib.eu/ldmservice/dataset/eurostat_1d4tm4fbnfad7udwhvq
    Explore at:
    Dataset updated
    Jan 8, 2025
    Area covered
    European Union
    Description

    Dataset replaced by: http://data.europa.eu/euodp/data/dataset/Agvl4w4bhTLLpvo3GIVrw The level of citizens confidence in EU institutions (Council of the European Union, European Parliament and European Commission) is expressed as the share of positive opinions (people who declare that they tend to trust) about the institutions. The indicator is based on the Eurobarometer, a survey which has been conducted twice a year since 1973 to monitor the evolution of public opinion in the Member States. The indicator only displays the results of the autumn survey. Potential replies to the question on the level of confidence include 'tend to trust', 'tend not to trust' and 'don't know' or 'no answer'. Trust is not precisely defined and could leave some room for interpretation to the interviewees.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2

The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Oct 15, 2024
Dataset provided by
Monash University
Authors
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

Search
Clear search
Close search
Google apps
Main menu