100+ datasets found
  1. Sexual orientation (detailed), comparison of corrected and original data,...

    • ons.gov.uk
    xlsx
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2023). Sexual orientation (detailed), comparison of corrected and original data, England and Wales: Census 2021 [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualorientationdetailedcomparisonofcorrectedandoriginaldataenglandandwalescensus2021
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Wales, England
    Description

    Dataset provided to help users interpret the correction made to the detailed Census 2021 sexual orientation estimates. More information in quality notice.

  2. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  3. i

    Title: Comparing Transaction Logs to ILL - Raw Data Open Access Deposited

    • datacore.iu.edu
    Updated May 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cohen, Rachael; Michaels, Sherri (2018). Title: Comparing Transaction Logs to ILL - Raw Data Open Access Deposited [Dataset]. https://datacore.iu.edu/concern/data_sets/z603qx40z?locale=en
    Explore at:
    Dataset updated
    May 8, 2018
    Dataset provided by
    IU Scholarworks
    Authors
    Cohen, Rachael; Michaels, Sherri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for "Comparing Transaction Logs to ILL requests to Determine the Persistence of Library Patrons In Obtaining Materials" article. Excel file contains all data in four worksheets Zip file contains four csv files, one for each worksheet: - Comparing Transaction Logs to ILL - 2016 ILL Raw ...Data.csv - Comparing Transaction Logs to ILL - 2015 ILL Raw Data.csv - Comparing Transaction Logs to ILL - 2016 Zero Search Raw Data.csv - Comparing Transaction Logs to ILL - 2015 Zero Search Raw Data.csv [more]

  4. i

    Experimental (raw) Data of Statistical Comparison Between Formal and...

    • ieee-dataport.org
    Updated May 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Karanikolas (2020). Experimental (raw) Data of Statistical Comparison Between Formal and Simulated Models’ Outcomes for CIBI vs. CVP General Problem [Dataset]. https://ieee-dataport.org/documents/experimental-raw-data-statistical-comparison-between-formal-and-simulated-models-outcomes
    Explore at:
    Dataset updated
    May 9, 2020
    Authors
    Chris Karanikolas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data corresponds to quantitative (raw) effort assessments/predictions during maintenance process of a sample of 1000 possible instances of the general selection problem among Visitor and Inheritance Based Implementation over the Composite design patterns (CIBI vs CVP).

  5. R

    Data from: Sample-comparison mapping and joint stimulus control

    • datarepositorium.uminho.pt
    pdf, tsv
    Updated Apr 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados da Universidade do Minho (2025). Sample-comparison mapping and joint stimulus control [Dataset]. http://doi.org/10.34622/datarepositorium/9SRSKQ
    Explore at:
    tsv(3692), pdf(20125)Available download formats
    Dataset updated
    Apr 7, 2025
    Dataset provided by
    Repositório de Dados da Universidade do Minho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    FCT
    Description

    Data from experiment "Sample-comparison mapping and joint stimulus control"

  6. Supplementary material from "Visual comparison of two data sets: Do people...

    • figshare.com
    xlsx
    Updated Mar 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 14, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Robin Kramer; Caitlin Telfer; Alice Towler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.

  7. d

    Percentage Differences Streamflow

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Percentage Differences Streamflow [Dataset]. https://catalog.data.gov/dataset/percentage-differences-streamflow
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Description

    A comma separated values (csv) file that is a snapshot of percent difference between November 19, 2008 and November 14, 2016 peak streamflow. The file lists station identification, water year, original (2008) peak Q, current (2016) peak Q and percent difference calculated per water year. The percent difference was calculated as the absolute value of [(current peak Q - original peak Q)/(original peak Q) x 100], where current peak Q is the 2016 peak and the original peak Q is the 2008 peak. When an original peak Q value is 0, the resultant percent difference calculation is undefined because of division by 0. In these cases, the percent difference field is populated with NA. Those entries are included in the data file so that users can make their own comparisons between the 2008 and 2016 peaks for those cases where the original peak value was 0.

  8. CONGRUENCE

    • figshare.com
    application/x-rar
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ayman baniamer (2025). CONGRUENCE [Dataset]. http://doi.org/10.6084/m9.figshare.28462568.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    figshare
    Authors
    ayman baniamer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    comparisons of MI VS ORIGINAL, EM VS ORIGINAL, and CIM VS ORIGINAL

  9. Benchmark Multi-Omics Datasets for Methods Comparison

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Nov 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Odom; Gabriel Odom; Lily Wang; Lily Wang (2021). Benchmark Multi-Omics Datasets for Methods Comparison [Dataset]. http://doi.org/10.5281/zenodo.5683002
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Nov 14, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriel Odom; Gabriel Odom; Lily Wang; Lily Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pathway Multi-Omics Simulated Data

    These are synthetic variations of the TCGA COADREAD data set (original data available at http://linkedomics.org/data_download/TCGA-COADREAD/). This data set is used as a comprehensive benchmark data set to compare multi-omics tools in the manuscript "pathwayMultiomics: An R package for efficient integrative analysis of multi-omics datasets with matched or un-matched samples".

    There are 100 sets (stored as 100 sub-folders, the first 50 in "pt1" and the second 50 in "pt2") of random modifications to centred and scaled copy number, gene expression, and proteomics data saved as compressed data files for the R programming language. These data sets are stored in subfolders labelled "sim001", "sim002", ..., "sim100". Each folder contains the following contents: 1) "indicatorMatricesXXX_ls.RDS" is a list of simple triplet matrices showing which genes (in which pathways) and which samples received the synthetic treatment (where XXX is the simulation run label: 001, 002, ...), (2) "CNV_partitionA_deltaB.RDS" is the synthetically modified copy number variation data (where A represents the proportion of genes in each gene set to receive the synthetic treatment [partition 1 is 20%, 2 is 40%, 3 is 60% and 4 is 80%] and B is the signal strength in units of standard deviations), (3) "RNAseq_partitionA_deltaB.RDS" is the synthetically modified gene expression data (same parameter legend as CNV), and (4) "Prot_partitionA_deltaB.RDS" is the synthetically modified protein expression data (same parameter legend as CNV).

    Supplemental Files

    The file "cluster_pathway_collection_20201117.gmt" is the collection of gene sets used for the simulation study in Gene Matrix Transpose format. Scripts to create and analyze these data sets available at: https://github.com/TransBioInfoLab/pathwayMultiomics_manuscript_supplement

  10. d

    Taichung City's new and old land number comparison data

    • data.gov.tw
    csv, json, xml
    Updated Jun 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Land Administration Bureau, Taichung City Government (2025). Taichung City's new and old land number comparison data [Dataset]. https://data.gov.tw/en/datasets/130155
    Explore at:
    xml, json, csvAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Land Administration Bureau, Taichung City Government
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    Taichung City
    Description

    Handle the re-survey of cadastre maps or cadastre organization areas, and the comparison table of old and new sections and plot numbers.

  11. h

    llm-comparison

    • huggingface.co
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Karev (2024). llm-comparison [Dataset]. https://huggingface.co/datasets/alex-karev/llm-comparison
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2024
    Authors
    Alex Karev
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LLM Similarity Comparison Dataset

    This dataset is pased on the original Alpaca dataset and was synthetically genearted for LLM similarity comparison using ConSCompF framework as described in the original paper. The script used for generating data is available on Kaggle. It is divided into 3 subsets:

    quantization - contains 156,000 samples (5,200 for each model) generated by the original Tinyllama and its 8-bit, 4-bit, and 2-bit GGUF quantized versions. comparison - contains 28,600… See the full description on the dataset page: https://huggingface.co/datasets/alex-karev/llm-comparison.

  12. j

    Original data for article: Comparison of epifluorescence microscopy and flow...

    • jyx.jyu.fi
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pauliina Salmi; Anita Mäki; Anu Mikkonen; Veli-Mikko Puupponen; Kristiina Vuorio; Marja Tiirola (2025). Original data for article: Comparison of epifluorescence microscopy and flow cytometry in counting freshwater picophytoplankton [Dataset]. http://doi.org/10.17011/jyx/dataset/66278
    Explore at:
    Dataset updated
    Feb 13, 2025
    Authors
    Pauliina Salmi; Anita Mäki; Anu Mikkonen; Veli-Mikko Puupponen; Kristiina Vuorio; Marja Tiirola
    License

    https://rightsstatements.org/page/InC/1.0/https://rightsstatements.org/page/InC/1.0/

    Description

    The dataset is divided into four subfolders: 1) "SEM experiment data" contains Scanning Electron Microscopy data, epifluorescence microscopy data and flow cytometry data of cultured Synechococcus, Chroococcus and Snowella 2) "raw data" contains epifluorescence microscopy and flow cytometry data of picophytoplankton from Finnish lakes. This has two sub folders "flow cytometry raw" and "microscopy raw" 3) "flow cytometry calibration data" contains data for cell size calibration with latex beads and volumetric calibration for the flow cytometer 4) "processed flow and microscopy data" contains excel workbooks for the figures shown in the manuscipt

  13. Comparison of original and final budgets 2009-10

    • data.wu.ac.at
    csv
    Updated Mar 1, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HM Treasury (2014). Comparison of original and final budgets 2009-10 [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/NWZjNmVlMDAtMTExNi00NDkyLTg3YWYtMDA5YjkxYzZmYTk3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 1, 2014
    Dataset provided by
    HM Treasuryhttps://gov.uk/hm-treasury
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Comparison of outturn information with final plans by department for 2009-10, taken from snapshots 31 and 11 (Main Estimate outturn snapshot April 2010 and Spring Supplementary Estimates plans snapshot February 2010). The 2009-10 data are consistent with the raw COINS data published in June 2010. The 2009-10 data will not match the provisional outturn for 2009-10 published by the Treasury on 26 July 2010. These datasets, and the COINS raw data will be updated at the end of September, to reflect the latest outturn for 2009-10, once all related national statistic releases have taken place.

  14. g

    Data from: Social Media as an Alternative to Surveys of Opinions about the...

    • datasearch.gesis.org
    • openicpsr.org
    Updated May 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Conrad, Frederick (2019). Social Media as an Alternative to Surveys of Opinions about the Economy [Dataset]. http://doi.org/10.3886/E109581V1
    Explore at:
    Dataset updated
    May 3, 2019
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Conrad, Frederick
    Description

    There is interest in using social media content to supplement or even substitute for survey data. O’Connor et al. (2010) report reasonably high correlations between the sentiment of tweets containing the word “jobs” and survey-based measures of consumer confidence in 2008-2009. Other researchers report a similar relationship through 2011 but after that time it is no longer observed, suggesting such tweets may not be as promising an alternative to survey responses as originally hoped. But, it’s possible that with the right analytic techniques, the sentiment of “jobs” tweets might still be an acceptable alternative. We explore this possibility by attempting to strengthen the original relationship and then extending the most successful approaches to more recent years. We classify “jobs” tweets into categories whose content is related to employment and categories whose content is not, to see if sentiment of the former correlates more highly with a survey-based measure of consumer sentiment. We use five sentiment-scoring tools, calculate daily sentiment three different ways, and use a measure of association less sensitive to outliers than correlation. None of these approaches improved the size of the relationship in the original or more recent data. We discuss the possibility that weighting and better understanding why users tweet might help recover the original relationship between the sentiment of tweets and survey responses. However, despite the earlier promise of tweets as an alternative to survey responses, we find no evidence that the original relationship was more than a chance occurrence.

  15. s

    Data from: Raw data for Comparison of Self-reported Measures of Hearing to...

    • eprints.soton.ac.uk
    • data.mendeley.com
    Updated Aug 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsimpida, Dalia (2020). Raw data for Comparison of Self-reported Measures of Hearing to an Objective Audiometric Measure in Adults in the English Longitudinal Study of Ageing (ELSA) [Dataset]. https://eprints.soton.ac.uk/486931/
    Explore at:
    Dataset updated
    Aug 15, 2020
    Dataset provided by
    Mendeley Data
    Authors
    Tsimpida, Dalia
    Description

    Raw data, computed data and statistical code for all main analyses and subgroup analyses presented in JAMA Netw Open. 2020;3(8):e2015009. doi:10.1001/jamanetworkopen.2020.15009 Data sharing statement: Access to The English Longitudinal Study of Ageing (ELSA) dataset is publicly available via the UK Data Service (https://www.ukdataservice.ac.uk) Note: Statistical code to create the subcategories of some demographic variables included in the analyses (e.g. age categories of participants) may not be available in the current dataset. Additional statistical code is available from the corresponding author upon reasonable request at: dialechti.tsimpida@manchester.ac.uk

  16. NACP Regional: Original Observation Data and Biosphere and Inverse Model...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). NACP Regional: Original Observation Data and Biosphere and Inverse Model Outputs - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/nacp-regional-original-observation-data-and-biosphere-and-inverse-model-outputs-7a660
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This data set contains the originally-submitted observation measurement data, terrestrial biosphere model output data, and inverse model simulations that various investigator teams contributed to the North American Carbon Program (NACP) Regional Synthesis activities. The data set provides nine (9) data packages of remote sensing and ground observation measurements (OM) (MODIS gross primary productivity (GPP), MODIS net primary production (NPP), MODIS fraction of photosynthetically active radiation (fPar), MODIS leaf area index (LAI), MODIS enhanced vegetation index (EVI), MODIS normalize difference vegetation index (NDVI), Forest Inventory and Analysis (FIA) forest biomass, National Agricultural Statistics Service (NASS) crop NPP, and Flux Anomaly). The data set also provides data packages of simulation results from 19 terrestrial biosphere models (TBM) and eight (8) inverse models (IM). The data packages are respectively OM, TBM, and IM data files listed in Tables 4-6. Each OM, TBM, and IM data package contains all of the original data (and documentation, if any) that the NACP Modeling and Synthesis Thematic Data Center (MAST-DC) acquired or received. These originally-submitted data were processed by the MAST-DC to produce the three standardized gridded data sets of carbon flux for inter-comparison purposes (see Related Data Products below). These original data and documentation are provided to allow users of the standardized gridded data products to be able to trace back to the data origins when needed. The Data Center (ORNL DAAC) transformed some of the originally-submitted data files to file formats that are more suitable for long-term archiving. For example, .xlsx files were saved as .csv, ERDAS Imagine files were converted to GeoTIFFs, and MATLAB files were converted to GeoTIFF and NetCDF formats as appropriate. Files received in NetCDF, GeoTIFF, and HDF formats were not transformed.

  17. d

    Data comparison of old and new land numbers in the 105th year of the Bade...

    • data.gov.tw
    csv
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Land Administration, Taoyuan (2024). Data comparison of old and new land numbers in the 105th year of the Bade district [Dataset]. https://data.gov.tw/en/datasets/28720
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 10, 2024
    Dataset authored and provided by
    Department of Land Administration, Taoyuan
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    Bade District
    Description

    The data of the comparison of new and old land numbers for the re-testing business in the history of the Bade District (until the end of 2015)

  18. Z

    One Classifier Ignores a Feature

    • data.niaid.nih.gov
    Updated Apr 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maier, Karl (2022). One Classifier Ignores a Feature [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6502642
    Explore at:
    Dataset updated
    Apr 29, 2022
    Dataset authored and provided by
    Maier, Karl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data sets are used in a controlled experiment, where two classifiers should be compared. train_a.csv and explain.csv are slices from the original data set. train_b.csv contains the same instances as in train_a.csv, but with feature x1 set to 0 to make it unusable to classifier B.

    The original data set was created and split using this Python code:

    from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression

    X, y = make_classification(n_samples=300, n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1, class_sep=0.75, random_state=0) X *= 100

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0) lm = LogisticRegression() lm.fit(X_train, y_train) clf_a = lm

    clf_b = LogisticRegression() X2 = X.copy() X2[:, 0] = 0 X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y, test_size=0.5, random_state=0) clf_b.fit(X2_train, y2_train)

    X_explain = X_test y_explain = y_test

  19. Z

    Simulation Data & R scripts for: "Introducing recurrent events analyses to...

    • data.niaid.nih.gov
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferry, Nicolas (2024). Simulation Data & R scripts for: "Introducing recurrent events analyses to assess species interactions based on camera trap data: a comparison with time-to-first-event approaches" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11085005
    Explore at:
    Dataset updated
    Apr 29, 2024
    Dataset authored and provided by
    Ferry, Nicolas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files descriptions:

    All csv files refer to results from the different models (PAMM, AARs, Linear models, MRPPs) on each iteration of the simulation. One row being one iteration. "results_perfect_detection.csv" refers to the results from the first simulation part with all the observations."results_imperfect_detection.csv" refers to the results from the first simulation part with randomly thinned observations to mimick imperfect detection.

    ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).PAMM30: p-value of the PAMM running on the 30-days survey.PAMM7: p-value of the PAMM running on the 7-days survey.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    "results_int_dir_perf_det.csv" refers to the results from the second simulation part, with all the observations."results_int_dir_imperf_det.csv" refers to the results from the second simulation part, with randomly thinned observations to mimick imperfect detection.ID_run: identified of the iteration (N: number of sites, D_AB: duration of the effect of A on B, D_BA: duration of the effect of B on A, AB: effect of A on B, BA: effect of B on A, Se: seed number of the iteration).p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of A on B.p_pamm7_AB: p-value of the PAMM running on the 7-days survey testing for the effect of B on A.AAR1: ratio value for the Avoidance-Attraction-Ratio calculating AB/BA.AAR2_BAB: ratio value for the Avoidance-Attraction-Ratio calculating BAB/BB.AAR2_ABA: ratio value for the Avoidance-Attraction-Ratio calculating ABA/AA.Harmsen_P: p-value from the linear model with interaction Species1*Species2 from Harmsen et al. (2009).Niedballa_P: p-value from the linear model comparing AB to BA (Niedballa et al. 2021).Karanth_permA: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species A (Karanth et al. 2017).MurphyAB_permA: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). MurphyBA_permA: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species A (Murphy et al. 2021). Karanth_permB: rank of the observed interval duration median (AB and BA undifferenciated) compared to the randomized median distribution, when permuting on species B (Karanth et al. 2017).MurphyAB_permB: rank of the observed AB interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021). MurphyBA_permB: rank of the observed BA interval duration median compared to the randomized median distribution, when permuting on species B (Murphy et al. 2021).

    Scripts files description:1_Functions: R script containing the functions: - MRPP from Karanth et al. (2017) adapted here for time efficiency. - MRPP from Murphy et al. (2021) adapted here for time efficiency. - Version of the ct_to_recurrent() function from the recurrent package adapted to process parallized on the simulation datasets. - The simulation() function used to simulate two species observations with reciprocal effect on each other.2_Simulations: R script containing the parameters definitions for all iterations (for the two parts of the simulations), the simulation paralellization and the random thinning mimicking imperfect detection.3_Approaches comparison: R script containing the fit of the different models tested on the simulated data.3_1_Real data comparison: R script containing the fit of the different models tested on the real data example from Murphy et al. 2021.4_Graphs: R script containing the code for plotting results from the simulation part and appendices.5_1_Appendix - Check for similarity between codes for Karanth et al 2017 method: R script containing Karanth et al. (2017) and Murphy et al. (2021) codes lines and the adapted version for time-efficiency matter and a comparison to verify similarity of results.5_2_Appendix - Multi-response procedure permutation difference: R script containing R code to test for difference of the MRPPs approaches according to the species on which permutation are done.

  20. Z

    Raw data for "Modular comparison of untargeted metabolomics processing...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aigensberger, Markus (2024). Raw data for "Modular comparison of untargeted metabolomics processing steps" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13643189
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset authored and provided by
    Aigensberger, Markus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data for the Paper titled "Modular comparison of untargeted metabolomics processing steps". The dataset encompasses 42 samples, with 3 solvent blanks, 7 QC samples, and 32 biological samples (4 biological replicates: Banane, Bergrose, Narbe, Ricky) spiked with 42 compounds in different concentrations (0 ngmL, 30 ngmL, 100 ngmL, 300 ngmL).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Office for National Statistics (2023). Sexual orientation (detailed), comparison of corrected and original data, England and Wales: Census 2021 [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/datasets/sexualorientationdetailedcomparisonofcorrectedandoriginaldataenglandandwalescensus2021
Organization logo

Sexual orientation (detailed), comparison of corrected and original data, England and Wales: Census 2021

Explore at:
xlsxAvailable download formats
Dataset updated
Nov 1, 2023
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Area covered
Wales, England
Description

Dataset provided to help users interpret the correction made to the detailed Census 2021 sexual orientation estimates. More information in quality notice.

Search
Clear search
Close search
Google apps
Main menu