100+ datasets found
  1. H

    Replication Data for: How Cross-Validation Can Go Wrong and What to Do About...

    • dataverse.harvard.edu
    Updated Jul 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Neunhoeffer; Sebastian Sternberg (2018). Replication Data for: How Cross-Validation Can Go Wrong and What to Do About it. [Dataset]. http://doi.org/10.7910/DVN/Y9KMJW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 19, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Marcel Neunhoeffer; Sebastian Sternberg
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The introduction of new “machine learning” methods and terminology to political science complicates the interpretation of results. Even more so, when one term – like cross-validation – can mean very different things. We find different meanings of cross-validation in applied political science work. In the context of predictive modeling, cross-validation can be used to obtain an estimate of true error or as a procedure for model tuning. Using a single cross-validation procedure to obtain an estimate of the true error and for model tuning at the same time leads to serious misreporting of performance measures. We demonstrate the severe consequences of this problem with a series of experiments. We also observe this problematic usage of cross-validation in applied research. We look at Muchlinski et al. (2016) on the prediction of civil war onsets to illustrate how the problematic cross-validation can affect applied work. Applying cross-validation correctly, we are unable to reproduce their findings. We encourage researchers in predictive modeling to be especially mindful when applying cross-validation.

  2. R

    Error Detection V2 Dataset

    • universe.roboflow.com
    zip
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    3dprinting (2024). Error Detection V2 Dataset [Dataset]. https://universe.roboflow.com/3dprinting/error-detection-v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 12, 2024
    Dataset authored and provided by
    3dprinting
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Warping Q42p Bounding Boxes
    Description

    Error Detection V2

    ## Overview
    
    Error Detection V2 is a dataset for object detection tasks - it contains Warping Q42p annotations for 1,827 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  3. E

    AKCES-GEC Grammatical Error Correction Dataset for Czech

    • live.european-language-grid.eu
    binary format
    Updated Sep 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). AKCES-GEC Grammatical Error Correction Dataset for Czech [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1280
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Sep 26, 2019
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AKCES-GEC is a grammar error correction corpus for Czech generated from a subset of AKCES. It contains train, dev and test files annotated in M2 format.

    Note that in comparison to CZESL-GEC dataset, this dataset contains separated edits together with their type annotations in M2 format and also has two times more sentences.

    If you use this dataset, please use following citation:

    @article{naplava2019wnut,

    title={Grammatical Error Correction in Low-Resource Scenarios},

    author={N{\'a}plava, Jakub and Straka, Milan},

    journal={arXiv preprint arXiv:1910.00353},

    year={2019}

    }

  4. d

    Data from: A Streamlined and High-Throughput Error-Corrected Next-Generation...

    • datadryad.org
    • zenodo.org
    zip
    Updated Aug 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Page B. McKinzie; Michelle E. Bishop (2019). A Streamlined and High-Throughput Error-Corrected Next-Generation Sequencing Method for Low Variant Allele Frequency Quantitation [Dataset]. http://doi.org/10.5061/dryad.jj4g11s
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 9, 2019
    Dataset provided by
    Dryad
    Authors
    Page B. McKinzie; Michelle E. Bishop
    Time period covered
    Jul 30, 2019
    Description

    Quantifying mutant or variable allele frequencies (VAFs) of ≤10−3 using next-generation sequencing (NGS) has utility in both clinical and nonclinical settings. Two common approaches for quantifying VAFs using NGS are tagged single-strand sequencing and duplex sequencing. While duplex sequencing is reported to have sensitivity up to 10−8 VAF, it is not a quick, easy, or inexpensive method. We report a method for quantifying VAFs that are ≥10−4 that is as easy and quick for processing samples as standard sequencing kits, yet less expensive than the kits. The method was developed using PCR fragment-based VAFs of Kras codon 12 in log10 increments from 10−5 to 10−1, then applied and tested on native genomic DNA. For both sources of DNA, there is a proportional increase in the observed VAF to input VAF from 10−4 to 100% mutant samples. Variability of quantitation was evaluated within experimental replicates and shown to be consistent across sample preparations. The error at each successive ba...

  5. f

    Relative L2 error (%) on the test set for the source domain (TL7).

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhou, Yuqian; Liu, Qian; Yang, Haolin; Li, Kebing; Xu, Jinghong (2025). Relative L2 error (%) on the test set for the source domain (TL7). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002035388
    Explore at:
    Dataset updated
    May 22, 2025
    Authors
    Zhou, Yuqian; Liu, Qian; Yang, Haolin; Li, Kebing; Xu, Jinghong
    Description

    Relative L2 error (%) on the test set for the source domain (TL7).

  6. s

    Citation Trends for "Many-electron self-interaction error in approximate...

    • shibatadb.com
    Updated Nov 28, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2006). Citation Trends for "Many-electron self-interaction error in approximate density functionals" [Dataset]. https://www.shibatadb.com/article/e9vMYnwh
    Explore at:
    Dataset updated
    Nov 28, 2006
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2007 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Many-electron self-interaction error in approximate density functionals".

  7. o

    Mckenzie Road Cross Street Data in Bad Axe, MI

    • ownerly.com
    Updated Apr 3, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ownerly (2022). Mckenzie Road Cross Street Data in Bad Axe, MI [Dataset]. https://www.ownerly.com/mi/bad-axe/mckenzie-rd-home-details
    Explore at:
    Dataset updated
    Apr 3, 2022
    Dataset authored and provided by
    Ownerly
    Area covered
    Bad Axe, McKenzie Road, Michigan
    Description

    This dataset provides information about the number of properties, residents, and average property values for Mckenzie Road cross streets in Bad Axe, MI.

  8. f

    Error in the understanding estimation using eye gaze features or the answer...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanches, Charles Lima; Augereau, Olivier; Kise, Koichi (2018). Error in the understanding estimation using eye gaze features or the answer feature. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000680650
    Explore at:
    Dataset updated
    Oct 25, 2018
    Authors
    Sanches, Charles Lima; Augereau, Olivier; Kise, Koichi
    Description

    Error in the understanding estimation using eye gaze features or the answer feature.

  9. f

    Predictor importance in RF.

    • plos.figshare.com
    xls
    Updated Feb 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Halit Tutar; Senol Celik; Hasan Er; Erdal Gönülal (2025). Predictor importance in RF. [Dataset]. http://doi.org/10.1371/journal.pone.0318230.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Halit Tutar; Senol Celik; Hasan Er; Erdal Gönülal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this study, the effect of morphological traits on fresh herbage yield of sorghum x sudangrass hybrid plant grown in Konya province, which is the largest cereal production area in Turkey, was analyzed with some data mining methods. For this purpose, Artificial Neural Networks (ANN), Automatic Linear Model (ALM), Random Forest (RF) Algorithm and Multivariate Adaptive Regression Spline (MARS) Algorithm were used, and the prediction performances of these methods were compared. Plant height of 251.22 cm, stem diameter of 7.03 mm, fresh herbage yield of 8010.69 kg da-1, crude protein ratio of 9.09%, acid detergent fiber 33.23%, neutral detergent fiber 57.44%, acid detergent lignin 7.43%, dry matter digestibility of 63.01%, dry matter intake 2.11%, and relative feed value of 103.02 were the descriptive statistical values that were computed. Model fit statistics, including coefficient of determination (R2), adjusted R2, root of mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation ratio (SD ratio), Mean Absolution Error (MAE) and Relative Absolution Error (RAE), were used to evaluate the prediction abilities of the fitted models. The MARS method was shown to be the best model for describing fresh herbage yield, with the lowest values of RMSE, MAPE, SD ratio, MAE and RAE (137.7, 1.488, 0.072, 109.718 and 0.017, respectively), as well as the highest R2 value (0.995) and adjusted R2 value (0.991). The experimental results show that the MARS algorithm is the most suitable model for predicting fresh herbage yield in sorghum x sudangrass hybrid, providing a good alternative to other data mining algorithms.

  10. T

    Lebanon Exports of wigs, false beards, eyebrow, eyelashes, switches;...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Dec 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2018). Lebanon Exports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair to Cyprus [Dataset]. https://tradingeconomics.com/lebanon/exports/cyprus/wigs-hair-human-hair-articles
    Explore at:
    csv, excel, json, xmlAvailable download formats
    Dataset updated
    Dec 5, 2018
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2025
    Area covered
    Lebanon
    Description

    Lebanon Exports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair to Cyprus was US$3.43 Thousand during 2021, according to the United Nations COMTRADE database on international trade.

  11. T

    Macau Imports of wigs, false beards, eyebrow, eyelashes, switches; articles...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Macau Imports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair from Switzerland [Dataset]. https://tradingeconomics.com/macau/imports/switzerland/wigs-hair-human-hair-articles
    Explore at:
    csv, excel, json, xmlAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2025
    Area covered
    Macao
    Description

    Macau Imports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair from Switzerland was US$30.14 Thousand during 2023, according to the United Nations COMTRADE database on international trade.

  12. 4

    Data/software underlying the publication: Fault-tolerant structures for...

    • data.4tu.nl
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yves van Montfort; Sébastian de Bone; David Elkouss (2024). Data/software underlying the publication: Fault-tolerant structures for measurement-based quantum computation on a network [Dataset]. http://doi.org/10.4121/929e24f9-31fa-4816-99fa-3356e272df43.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Yves van Montfort; Sébastian de Bone; David Elkouss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Dutch Research Council
    Description

    In this work, we introduce a method to construct fault-tolerant measurement-based quantum computation (MBQC) architectures and numerically estimate their performance over various types of networks. A possible application of such a paradigm is distributed quantum computation, where separate computing nodes work together on a fault-tolerant computation through entanglement. We gauge error thresholds of the architectures with an efficient stabilizer simulator to investigate the resilience against both circuit-level and network noise. We show that, for both monolithic (i.e., non-distributed) and distributed implementations, an architecture based on the diamond lattice may outperform the conventional cubic lattice. Moreover, the high erasure thresholds of non-cubic lattices may be exploited further in a distributed context, as their performance may be boosted through entanglement distillation by trading in entanglement success rates against erasure errors during the error decoding process. These results highlight the significance of lattice geometry in the design of fault-tolerant measurement-based quantum computing on a network, emphasizing the potential for constructing robust and scalable distributed quantum computers.

  13. t

    Impact of erroneous a priori information on the UT1-UTC determination from...

    • researchdata.tuwien.at
    zip
    Updated Jun 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa Kern; Lisa Kern; Matthias Schartner; Matthias Schartner; Sigrid Böhm; Sigrid Böhm (2024). Impact of erroneous a priori information on the UT1-UTC determination from VLBI Intensive sessions (simulation study) [Dataset]. http://doi.org/10.48436/08qqz-ymp66
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Lisa Kern; Lisa Kern; Matthias Schartner; Matthias Schartner; Sigrid Böhm; Sigrid Böhm
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was generated by researchers from the TU Wien Department of Geodesy and Geoinformation and the ETH Zürich Department of Civil, Environmental and Geomatic Engineering, as a fundamental part of a study related to the analysis of the impact of erroneous a priori information on the UT1-UTC determination from VLBI Intensive sessions. The corresponding publication (title: "On the importance of accurate pole and station coordinates for VLBI Intensive baselines") has been submitted to the Journal of Geodesy.

    In addition, a conference paper, and presentations at the IVS General Meeting 2022, EGU General Assembly 2022 and REFAG 2022 are available.

    Context and methodology

    The dataset contains monthly simulated UT1-UTC values of an artificial global grid of VLBI antennas (VGOS) where realistic errors in the a priori values of the station coordinates, polar motion and nutation offsets are introduced. With the help of these simulated values the global impact of erroneous a priori information is analysed.

    VieSched++ and VieVS (both developed at the TU Wien) were used to generate the schedules and simulations.

    Technical details

    The dataset is structured as follows. There are 7 subfolders in the zipped folder that contain the simulation results of the evaluations with modified a priori values:

    • folder "errSTAu" - error of 5 mm introduced in the up-direction of the second station
    • folder "errSTAe" - error of 5 mm introduced in the east-direction of the second station
    • folder "errSTAn" - error of 5 mm introduced in the north-direction of the second station
    • folder "errPMx" - error of 162 microarcseconds introduced in the x-component of the polar motion
    • folder "errPMy" - error of 162 microarcseconds introduced in the y-component of the polar motion
    • folder "errNUTx" - error of 162 microarcseconds introduced in the x-component of the nutation offsets
    • folder "errNUTy" - error of 162 microarcseconds introduced in the y-component of the nutation offsets

    Within these folders, there are .txt files with the following naming convention: "N%E%_N%E%_d#.txt".

    • "N%E%_N%E%" represents the location (North and East in degrees = latitude and longitude) of the reference and remote station
    • "d#" again shows the error that has been introduced in the simulation process
    • The files contain the monthly simulation results of UT1-UTC and its accuracy in milliseconds.

  14. f

    Data from: Investigating students’ awareness of their own and others’...

    • tandf.figshare.com
    docx
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linda Hämmerle; Andrea Möller; Alexander Bergmann-Gering; Theresa Krause-Wichmann; Judith Lederman (2025). Investigating students’ awareness of their own and others’ deviations from controlled science experiments [Dataset]. http://doi.org/10.6084/m9.figshare.28281315.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Linda Hämmerle; Andrea Möller; Alexander Bergmann-Gering; Theresa Krause-Wichmann; Judith Lederman
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Despite an emphasis on experimentation in science curricula worldwide as part of efforts to improve scientific literacy, students encounter challenges, especially with designing unconfounded experiments as part of the control-of-variables strategy (CVS). Becoming aware of experimental design errors is a potential starting point for learners to enhance their experimental skills. Few studies investigated whether learners can accurately assess experiments and are aware of errors. This experimental study, conducted during the COVID-19 pandemic, investigates the accuracy of students’ assessment of experiments and their awareness of self-generated and others’ (vicarious) design errors. 127 students (grade 7–8) were randomly split into two groups. One group conducted an experiment themselves. The other examined an erroneous example of a fictitious peer. Afterwards, both received the same instructions on the CVS and prompts to assess the experiments. Data were collected via worksheets and photos of their experiments. Analysis reveals difficulties controlling all variables in an experiment, especially if they were continuous. Interestingly, while self- and peer assessment accuracy was generally high, students were significantly more aware of vicarious errors than of self-generated ones. This highlights the potential of using assessment of experimental design errors as learning opportunity for experimentation skills, especially when using vicarious errors.

  15. T

    Singapore Exports of wigs, false beards, eyebrow, eyelashes, switches;...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Nov 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2022). Singapore Exports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair to Turkey [Dataset]. https://tradingeconomics.com/singapore/exports/turkey/wigs-hair-human-hair-articles
    Explore at:
    json, excel, xml, csvAvailable download formats
    Dataset updated
    Nov 30, 2022
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2025
    Area covered
    Singapore
    Description

    Singapore Exports of wigs, false beards, eyebrow, eyelashes, switches; articles of human hair to Turkey was US$9.07 Thousand during 2023, according to the United Nations COMTRADE database on international trade.

  16. Veteran Status 2018-2022 - STATES

    • hub.arcgis.com
    • covid19-uscensus.hub.arcgis.com
    Updated Feb 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Census Bureau (2024). Veteran Status 2018-2022 - STATES [Dataset]. https://hub.arcgis.com/maps/a66f7c567e014a0892d956d73a24bf74
    Explore at:
    Dataset updated
    Feb 5, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    US Census Bureau
    Area covered
    Description

    This service contains the 2018-2022 release of data from the American Community Survey (ACS) 5-year data about Veteran Status, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis. This layer is symbolized to show the percentage of the civilian population over the age of 18 that are Veterans.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Current Vintage: 2018-2022ACS Table(s): DP02Data downloaded from: CensusBureau's API for American Community Survey Date of API call: January 18, 2024National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:Boundaries come from the Cartographic Boundaries via US Census TIGER geodatabases. Boundaries are updated at the same time as the data updates, and the boundary vintage appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines clipped for cartographic purposes. For state and county boundaries, the water and coastlines are derived from the coastlines of the 500k TIGER Cartographic Boundary Shapefiles. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto Rico. The Counties (and equivalent) layer contains 3221 records - all counties and equivalent, Washington D.C., and Puerto Rico municipios. See Areas Published. Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells.Margin of error (MOE) values of -555555555 in the API (or "*****" (five asterisks) on data.census.gov) are displayed as 0 in this dataset. The estimates associated with these MOEs have been controlled to independent counts in the ACS weighting and have zero sampling error. So, the MOEs are effectively zeroes, and are treated as zeroes in MOE calculations. Other negative values on the API, such as -222222222, -666666666, -888888888, and -999999999, all represent estimates or MOEs that can't be calculated or can't be published, usually due to small sample sizes. All of these are rendered in this dataset as null (blank) values.

  17. h

    chunk_157

    • huggingface.co
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    distilled-false-pos-one-sec-cv12 (2024). chunk_157 [Dataset]. https://huggingface.co/datasets/distilled-false-pos-one-sec-cv12/chunk_157
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    distilled-false-pos-one-sec-cv12
    Description

    distilled-false-pos-one-sec-cv12/chunk_157 dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. 2023 American Community Survey: B99258 | Allocation of Bedrooms (ACS 1-Year...

    • data.census.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2023 American Community Survey: B99258 | Allocation of Bedrooms (ACS 1-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table?tid=ACSDT1Y2023.B99258
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  19. DeepLabCut network trained to track mouse body parts during open field...

    • zenodo.org
    zip
    Updated Sep 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie A. Labouesse; Marie A. Labouesse; Shana Gershbaum; Julia Greenwald; Christoph Kellendonk; Shana Gershbaum; Julia Greenwald; Christoph Kellendonk (2023). DeepLabCut network trained to track mouse body parts during open field locomotion (top-down view) [Dataset]. http://doi.org/10.5281/zenodo.6448595
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marie A. Labouesse; Marie A. Labouesse; Shana Gershbaum; Julia Greenwald; Christoph Kellendonk; Shana Gershbaum; Julia Greenwald; Christoph Kellendonk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DeepLabCut (https://github.com/DeepLabCut/) (Mathis et al., 2018; Nath et al., 2019) was used for tracking body parts of mice in an open field arena or in the rotarod. DeepLabCut 2.1.8.2 (local version on Windows with CPU, using the GUI) and 2.1.10.2 (google colab to train the network) were used using default parameters and the pretrained resnet50 network with imgaug augmentation. Frames were extracted with the k-means method and outlier frames with the jump method. Open field: 20 images from 19 videos (10 or 30 fps) were extracted for a total of 380 labeled pictures. 8 body parts (snout, both ears, body center, both side laterals, tail base and tail end) and the 4 corners of the field arena were manually labeled and linked to each other using skeletons. A neural network was trained using these images for 170K iterations. 20 outlier frames were extracted from each video and relabeled. An additional 20 images from 19 videos with different recording conditions were labeled. The network was then refined for 210K iterations (from scratch), yielding a train error of 3.33 pixels and a test error of 8.83 pixels (with a likelihood p-cutoff of 0.6). This process was repeated a second time (using an additional 20 images from 15 new videos) to improve the pixel error; to a final 400 K iterations (train error: 2.65, test error: 3.71). 67 videos from 5 different experiments were analyzed on the final network.

    Used to analyze videos for a publication (Labouesse et al., Nature Communications 2023)

  20. g

    1996 Czech Election: Post-Election Study June 1996

    • search.gesis.org
    Updated Apr 13, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toka, Gabor (2010). 1996 Czech Election: Post-Election Study June 1996 [Dataset]. http://doi.org/10.4232/1.3633
    Explore at:
    application/x-stata-dta(287772), application/x-spss-por(528900), application/x-spss-sav(311796)Available download formats
    Dataset updated
    Apr 13, 2010
    Dataset provided by
    GESIS search
    GESIS Data Archive
    Authors
    Toka, Gabor
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Time period covered
    Jun 9, 1996 - Jun 19, 1996
    Area covered
    Czechia
    Variables measured
    V148 - gender, V161 - industry, V1 - ZA Studynumber, V167 - denomination, V72 - LAST ELECTION, V87 - SYMPATHY: ODA, V88 - SYMPATHY: ODS, V117 - Death penalty, V84 - SYMPATHY: CSSD, V86 - SYMPATHY: KSCM, and 169 more
    Description

    Voting behaviour and political attitudes. Topics: Husehold finances in last and next 12 months; national economy in last and next 12 months; participated in the 1996 lower house election; vote in the 1996 lower house election; spouse: vote in the 1996 election; use of preference vote in 1996; tactical voting; knowledge: representation of party respondent voted for; radio and TV: sufficient information; radio and TV: impartial party which media favoured most; like about the parties (open question); government performance; provide a job for everyone; reducing income differences is harmful; the economic situation is unfavourable; privatisation is going to help; unprofitable enterprises should be closed down; atheists are unfit for public office; nationalism is always harmful chance of getting ahead; politicians should care more about crime; abortion should be allowed; preference of patriotic politician; church has too much influence; split of Czechoslovakia was wrong; restitution was wrong; left-right self-placement (7-point scale); satisfaction with democracy; last election; respondent close to any party; first party close to respondent second party close to respondent; third party close to respondent; party closest to respondent; any party closer than others; which party closer than others; how close to closest party; parties care what people want; parties are necessary; recall of name of candidate; sympathy of parties; state of economy; change in economic situation; MPs´ idea what people think contact with MP; who is in power; the way people vote; people say or hide opinion; left-right placement of parties; elections help to keep politicians honest; in election campaigns people can learn; elections divide the country; benefits of elections far outweigh the costs; death penalty; husband is to earn the money; clergy should not influence vote too many people rely on government assistance; smooth cooperation in firms is impossible; not enough respect for traditional Czech values; schools should teach children to obey; get rid of conflicts between the parties; for democracy turnout does not matter; voters decide how things are run; most voters cannot make intelligent decisions; better leaders would be chosen through exams; Czech Rep join the NATO; Czech Rep join the EEC; preferred relationship between Czech R and Slovakia; present regime compared to pre-1989 regime; people should refrain from criticizing Czech officials; politician may act contrary to the law; some people earn millions; people are responsible for their poverty; competent people can earn a lot of money; people get rich here mainly in an illegal way; private ownership should be expanded; more efforts to reduce inequalities; less government intervention; more toughness needed against Romany offenders; Romanians should be let to lead their own way of live; knowledge about electoral threshold, name of Minister of Transport, number of seats in Czech lower house; language spoken at home; occupation (respondent and spouse): ISCO code, EGP-10 classification and EGP-6 classification; strength of religious belief; frequency of church attendance; denomination; union membership: respondent; union membership: somebody else in household; gypsy or not(judgment of interviewer); date and length of the interview; number of contact attempts for interview; interview demanding; respondents primary electoral district.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marcel Neunhoeffer; Sebastian Sternberg (2018). Replication Data for: How Cross-Validation Can Go Wrong and What to Do About it. [Dataset]. http://doi.org/10.7910/DVN/Y9KMJW

Replication Data for: How Cross-Validation Can Go Wrong and What to Do About it.

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2018
Dataset provided by
Harvard Dataverse
Authors
Marcel Neunhoeffer; Sebastian Sternberg
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The introduction of new “machine learning” methods and terminology to political science complicates the interpretation of results. Even more so, when one term – like cross-validation – can mean very different things. We find different meanings of cross-validation in applied political science work. In the context of predictive modeling, cross-validation can be used to obtain an estimate of true error or as a procedure for model tuning. Using a single cross-validation procedure to obtain an estimate of the true error and for model tuning at the same time leads to serious misreporting of performance measures. We demonstrate the severe consequences of this problem with a series of experiments. We also observe this problematic usage of cross-validation in applied research. We look at Muchlinski et al. (2016) on the prediction of civil war onsets to illustrate how the problematic cross-validation can affect applied work. Applying cross-validation correctly, we are unable to reproduce their findings. We encourage researchers in predictive modeling to be especially mindful when applying cross-validation.

Search
Clear search
Close search
Google apps
Main menu