91 datasets found
  1. Poor data quality causes among enterprises in North America 2015

    • statista.com
    Updated Jan 26, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2016). Poor data quality causes among enterprises in North America 2015 [Dataset]. https://www.statista.com/statistics/518069/north-america-survey-enterprise-poor-data-quality-reasons/
    Explore at:
    Dataset updated
    Jan 26, 2016
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2015
    Area covered
    United States, Canada
    Description

    The statistic depicts the causes of poor data quality for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, 47 percent of respondents indicated that poor data quality at their company was attributable to data migration or conversion projects.

  2. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  3. Breaking Bad IMDb ratings, votes and US views

    • kaggle.com
    zip
    Updated Aug 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    t2 (2020). Breaking Bad IMDb ratings, votes and US views [Dataset]. https://www.kaggle.com/twintyone/breaking-bad-ratings
    Explore at:
    zip(1362 bytes)Available download formats
    Dataset updated
    Aug 26, 2020
    Authors
    t2
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    To visualize numerical data episode-wise and comparative analysis with other famous TV-shows.

    Content

    # of season, # of episode, title, year, and other numerical data such as IMDb ratings, IMDb votes, US views

    Acknowledgements

    Data collected from here https://www.ratingraph.com/tv-shows/breaking-bad-ratings-26165/ https://www.wikiwand.com/en/List_of_Breaking_Bad_episodes

    Inspiration

    Saw some cool visualizations in reddit few days back but couldn't find anymore. :(

  4. Q

    Data for: The Bystander Affect Detection (BAD) Dataset for Failure Detection...

    • data.qdr.syr.edu
    pdf, tsv, txt, zip
    Updated Sep 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra Bremers; Alexandra Bremers; Xuanyu Fang; Xuanyu Fang; Natalie Friedman; Natalie Friedman; Wendy Ju; Wendy Ju (2023). Data for: The Bystander Affect Detection (BAD) Dataset for Failure Detection in HRI [Dataset]. http://doi.org/10.5064/F6TAWBGS
    Explore at:
    zip(66872585), zip(67359564), zip(49981372), zip(45063165), zip(35942055), tsv(5431), zip(63732190), zip(32108293), zip(33064251), zip(49848937), zip(38858151), zip(137880775), zip(90804192), zip(36477139), zip(38068214), zip(36039067), zip(37592931), zip(34234760), zip(63445623), zip(38092264), zip(45582594), zip(50915158), zip(111033502), zip(32955394), zip(30549219), zip(39991378), zip(166237686), zip(50351519), zip(62744513), zip(46810648), zip(34379478), zip(35492684), zip(22036189), pdf(197935), zip(66187509), zip(40085473), zip(40798037), pdf(113804), zip(12931695), zip(31593404), zip(26677367), zip(35547615), tsv(244631), zip(35954889), txt(7329), zip(74593629), zip(52574377), zip(55483165), zip(31323914), zip(43519637), zip(42743107), zip(55790691), zip(50499507), zip(76761027), zip(38063092), zip(55654900), zip(30504764), zip(48203736), zip(40422817)Available download formats
    Dataset updated
    Sep 25, 2023
    Dataset provided by
    Qualitative Data Repository
    Authors
    Alexandra Bremers; Alexandra Bremers; Xuanyu Fang; Xuanyu Fang; Natalie Friedman; Natalie Friedman; Wendy Ju; Wendy Ju
    License

    https://qdr.syr.edu/policies/qdr-restricted-access-conditionshttps://qdr.syr.edu/policies/qdr-restricted-access-conditions

    Description

    Project Overview For a robot to repair its own error, it must first know it has made a mistake. One way that people detect errors is from the implicit reactions from bystanders – their confusion, smirks, or giggles clue us in that something unexpected occurred. To enable robots to detect and act on bystander responses to task failures, we developed a novel method to elicit bystander responses to human and robot errors. Data Overview This project introduces the Bystander Affect Detection (BAD) dataset – a dataset of videos of bystander reactions to videos of failures. This dataset includes 2,452 human reactions to failure, collected in contexts that approximate “in-the-wild” data collection – including natural variances in webcam quality, lighting, and background. The BAD dataset may be requested for use in related research projects. As the dataset contains facial video data of participants, access can be requested along with the presentation of a research protocol and data use agreement that protects participants. Data Collection Overview and Access Conditions Using 46 different stimulus videos featuring a variety of human and machine task failures, we collected a total of 2,452 webcam videos of human reactions from 54 participants. Recruitment happened through the online behavioral research platform Prolific (https://www.prolific.co/about), where the options were selected to recruit a gender-balanced sample across all countries available. Participants had to use a laptop or desktop. Compensation was set at the Prolific rate of $12/hr, which came down to about $8 per participant for about 40 minutes of participation. Participants agreed that their data can be shared for future research projects and the data were approved to be shared publicly by IRB review. However, considering the fact that this is a machine-learning dataset containing identifiable crowdsourced human subjects data, the research team has decided that potential secondary users of the data must meet the following criteria for the access request to be granted: 1. Agreement to three usage terms: - I will not redistribute the contents of the BAD Dataset - I will not use videos for purposes outside of human interaction research (broadly defined as any project that aims to study or develop improvements to human interactions with technology to result in a better user experience) - I will not use the videos to identify, defame, or otherwise negatively impact the health, welfare, employment or reputation of human participants 2. A description of what you want to use the BAD dataset for, indicating any applicable human subjects protection measures that are in place. (For instance, "Me and my fellow researchers at University of X, lab of Y, will use the BAD dataset to train a model to detect when our Nao robot interrupts people at awkward times. The PI is Professor Z. Our protocol was approved under IRB #.") 3. A copy of the IRB record or ethics approval document, confirming the research protocol and institutional approval. Data Analysis To test the viability of the collected data, we used the Bystander Reaction Dataset as input to a deep-learning model, BADNet, to predict failure occurrence. We tested different data labeling methods and learned how they affect model performance, achieving precisions above 90%. Shared Data Organization This data project consists of 54 zipped folders of recorded video data organized by participant, totaling 2,452 videos. The accompanying documentation includes a file containing the text of the consent form used for the research project, an inventory of the stimulus videos used, aggregate survey data, this data narrative, and an administrative readme file. Special Notes The data were approved to be shared publicly by IRB review. However, considering the fact that this is a machine-learning dataset containing identifiable crowdsourced human subjects data, the research team has decided that potential secondary users of the data must meet specific criteria before they qualify for access. Please consult the Terms tab below for more details and follow the instructions there if interested in requesting access.

  5. Four Quarter Financial Summary Hospital Utilization Charity Care and Bad...

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, docx, zip
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2024). Four Quarter Financial Summary Hospital Utilization Charity Care and Bad Debt Summary [Dataset]. https://data.chhs.ca.gov/dataset/four-quarter-financial-summary-hospital-utilization-charity-care-and-bad-debt-summary
    Explore at:
    csv, docx, zipAvailable download formats
    Dataset updated
    Aug 28, 2024
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset provides the fourth quarter summary roll-up of California hospitals’ financial and utilization data for Charity Care and Bad Debts.

  6. N

    Comprehensive Median Household Income and Distribution Dataset for Bad Axe,...

    • neilsberg.com
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Bad Axe, MI: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd881c54-b041-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the median household income in Bad Axe. It can be utilized to understand the trend in median household income and to analyze the income distribution in Bad Axe by household type, size, and across various income brackets.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Bad Axe, MI Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)
    • Median Household Income Variation by Family Size in Bad Axe, MI: Comparative analysis across 7 household sizes
    • Income Distribution by Quintile: Mean Household Income in Bad Axe, MI
    • Bad Axe, MI households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Bad Axe median household income. You can refer the same here

  7. f

    Data from: An Evaluation of the Use of Statistical Procedures in Soil...

    • scielo.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laene de Fátima Tavares; André Mundstock Xavier de Carvalho; Lucas Gonçalves Machado (2023). An Evaluation of the Use of Statistical Procedures in Soil Science [Dataset]. http://doi.org/10.6084/m9.figshare.19944438.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELO journals
    Authors
    Laene de Fátima Tavares; André Mundstock Xavier de Carvalho; Lucas Gonçalves Machado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT Experimental statistical procedures used in almost all scientific papers are fundamental for clearer interpretation of the results of experiments conducted in agrarian sciences. However, incorrect use of these procedures can lead the researcher to incorrect or incomplete conclusions. Therefore, the aim of this study was to evaluate the characteristics of the experiments and quality of the use of statistical procedures in soil science in order to promote better use of statistical procedures. For that purpose, 200 articles, published between 2010 and 2014, involving only experimentation and studies by sampling in the soil areas of fertility, chemistry, physics, biology, use and management were randomly selected. A questionnaire containing 28 questions was used to assess the characteristics of the experiments, the statistical procedures used, and the quality of selection and use of these procedures. Most of the articles evaluated presented data from studies conducted under field conditions and 27 % of all papers involved studies by sampling. Most studies did not mention testing to verify normality and homoscedasticity, and most used the Tukey test for mean comparisons. Among studies with a factorial structure of the treatments, many had ignored this structure, and data were compared assuming the absence of factorial structure, or the decomposition of interaction was performed without showing or mentioning the significance of the interaction. Almost none of the papers that had split-block factorial designs considered the factorial structure, or they considered it as a split-plot design. Among the articles that performed regression analysis, only a few of them tested non-polynomial fit models, and none reported verification of the lack of fit in the regressions. The articles evaluated thus reflected poor generalization and, in some cases, wrong generalization in experimental design and selection of procedures for statistical analysis.

  8. N

    Dataset for Bad Axe, MI Census Bureau Income Distribution by Race

    • neilsberg.com
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Bad Axe, MI Census Bureau Income Distribution by Race [Dataset]. https://www.neilsberg.com/research/datasets/80b86f9f-9fc2-11ee-b48f-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 3, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Bad Axe median household income by race. The dataset can be utilized to understand the racial distribution of Bad Axe income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Bad Axe, MI median household income breakdown by race betwen 2011 and 2021
    • Median Household Income by Racial Categories in Bad Axe, MI (2021, in 2022 inflation-adjusted dollars)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Bad Axe median household income by race. You can refer the same here

  9. f

    Ten quick tips for getting the most scientific value out of numerical data

    • plos.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Ole Schwen; Sabrina Rueschenbaum (2023). Ten quick tips for getting the most scientific value out of numerical data [Dataset]. http://doi.org/10.1371/journal.pcbi.1006141
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Lars Ole Schwen; Sabrina Rueschenbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.

  10. BAD: Bilingual Adaptations Dataset

    • openneuro.org
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuanyi Jessica Chen; Maxwell Salvadore; Esti Blanco-Elorrieta (2025). BAD: Bilingual Adaptations Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds006391.v1.0.0
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Xuanyi Jessica Chen; Maxwell Salvadore; Esti Blanco-Elorrieta
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    README

    This repository contains raw MRI data of 127 subjects with varying language backgrounds and proficiencies. Below is a detailed outline of the file structure used:


    sub-EBE****

    Each of these directories contain the BIDS formatted anatomical and functional MRI data, with the name of the directory corresponding to the subject's unique identifier.

    For more information on the subdirectories, see BIDS information at https://bids-specification.readthedocs.io/en/stable/appendices/entity-table.html


    derivatives

    This directory contains outputs of common processing pipelines run on the raw MRI data from "data/sub-EBE****".

    derivatives/CAT12

    These are the results of the CAT12 toolbox, which stands for Computational Anatomy Toolbox, and is used to calculate brain region volumes using voxel-based morphometry (VBM). A few things are required to download for this process.

    1. MATLAB v. R2023a (https://www.mathworks.com/products/new_products/release2023a.html)
    2. SPM (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/)
    3. CAT12 (https://neuro-jena.github.io/cat/index.html#DOWNLOAD)

    derivatives/conn

    CONN is used to generate data on functional connectivity from brain fMRI sequences. A few things are required to download for this process.

    1. MATLAB v. R2023a (https://www.mathworks.com/products/new_products/release2023a.html)
    2. SPM (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/)
    3. CAT12 (https://neuro-jena.github.io/cat/index.html#DOWNLOAD)
    4. Conn – MATLAB toolbox for functional connectivity (https://web.conn-toolbox.org/)

    derivatives/fdt

    We used FMRIB's Diffusion Toolbox (FDT) for extracting values from diffusion weighted images. To use FDT, you need to download the following modules through CLI:

    1. module load fsl/6.0.2
    2. module load freesurfer/7.4.1

    For more information on the toolbox, visit https://fsl.fmrib.ox.ac.uk/fsl/docs/#/diffusion/index.

    derivatives/fMRIprep

    fMRIprep is the preprocessing of task-based and resting-state functional MRI. We use it to generate data for connectivity.

    We used fMRIprep v23.0.2. For more information, visit https://fmriprep.org/en/stable/index.html.

    derivatives/freesurfer

    FreeSurfer is a software package for the analysis and visualization of structural and functional neuroimaging data, which we use to extract region volumes through surface-based morphometry (SBM).

    We used freesurfer v7.4.1. For more information, visit https://surfer.nmr.mgh.harvard.edu/fswiki.



    analysis/

    This directory contains data and code used in the analysis of Chen, Salvadore, Blanco-Elorrieta (submitted).

    analysis/code

    This directory contains python and R code used in the analysis of Chen, Salvadore, Blanco-Elorrieta (submitted), with each python notebook corresponding to a different part of the paper's analysis. For more details on each file and subdirectories, see "analysis/code/README.md".

    analysis/participant_data

    This directory contains language data on each subject, including a composite multilingualism score from Chen & Blanco-Elorrieta (submitted), information on language knowledge, exposure, mixing, use in education, and family members’ language ability in the participants’ known languages from early childhood to the present day. For more information on the files and their fields, see "analysis/participant_data/metadata.xlsx".

    analysis/processed_mri_data

    This directory contains MRI data, both anatomical and functional, that is the final result of processing raw MRI data. This includes brain volumes, cortical thickness, fractional anisotropy values, and connectivity measures. For more information on the files within this directory, see "analysis/processed_mri_data/metadata.xlsx".

  11. d

    Geomorphon landforms in the Bad River (Mashkiiziibii) Estuary, derived from...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Geomorphon landforms in the Bad River (Mashkiiziibii) Estuary, derived from 2019 lidar data [Dataset]. https://catalog.data.gov/dataset/geomorphon-landforms-in-the-bad-river-mashkiiziibii-estuary-derived-from-2019-lidar-data
    Explore at:
    Dataset updated
    Oct 3, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Landforms in the Bad River (Mashkiiziibii) Estuary were mapped with geomorphons, an automated terrain analysis method that classifies digital elevation model (DEM) cells into ten fundamental 3-dimensional geometric forms – summit, ridge, shoulder, spur, slope, hollow, footslope, valley, depression, and flat – based on the topography within the visibility neighborhood of each cell. The geomorphons were developed from a (DEM) comprising topographic and bathymetric data for the estuary, developed from elevation data collected by airborne topographic and bathymetric lidar and single-beam sonar. Resulting landform features were attributed with a variety of characteristics, including an array of morphometrics quantifying the detailed three-dimensional shape of each feature, and the hydrologic setting as characterized by the distance and orientation relative to the nearest National Hydrologic Dataset (NHD) river channel, and by the frequency and maximum depth of flooding according to an inundation mapping analysis. We used a subset of these attributes in a K-means multivariate statistical clustering analysis, identifying five groupings or process zones within the landform features, including river channels, leveed and un-leveed channel margins, estuary flats, and distal, convex-up features.

  12. N

    Bad Axe, MI annual income distribution by work experience and gender dataset...

    • neilsberg.com
    csv, json
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Bad Axe, MI annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021) [Dataset]. https://www.neilsberg.com/research/datasets/2364d057-981b-11ee-99cf-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Variables measured
    Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Bad Axe. The dataset can be utilized to gain insights into gender-based income distribution within the Bad Axe population, aiding in data analysis and decision-making..

    Key observations

    • Employment patterns: Within Bad Axe, among individuals aged 15 years and older with income, there were 960 men and 1,375 women in the workforce. Among them, 462 men were engaged in full-time, year-round employment, while 348 women were in full-time, year-round roles.
    • Annual income under $24,999: Of the male population working full-time, 24.24% fell within the income range of under $24,999, while 33.62% of the female population working full-time was represented in the same income bracket.
    • Annual income above $100,000: none of men in full-time roles earned incomes exceeding $100,000, while 7.18% of women in full-time positions earned within this income bracket.
    • Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

    https://i.neilsberg.com/ch/bad-axe-mi-income-distribution-by-gender-and-employment-type.jpeg" alt="Bad Axe, MI gender and employment-based income distribution analysis (Ages 15+)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income brackets:

    • $1 to $2,499 or loss
    • $2,500 to $4,999
    • $5,000 to $7,499
    • $7,500 to $9,999
    • $10,000 to $12,499
    • $12,500 to $14,999
    • $15,000 to $17,499
    • $17,500 to $19,999
    • $20,000 to $22,499
    • $22,500 to $24,999
    • $25,000 to $29,999
    • $30,000 to $34,999
    • $35,000 to $39,999
    • $40,000 to $44,999
    • $45,000 to $49,999
    • $50,000 to $54,999
    • $55,000 to $64,999
    • $65,000 to $74,999
    • $75,000 to $99,999
    • $100,000 or more

    Variables / Data Columns

    • Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..
    • Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket
    • Part-Time Males: The count of males employed part-time and earning within a specified income bracket
    • Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket
    • Part-Time Females: The count of females employed part-time and earning within a specified income bracket

    Employment type classifications include:

    • Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.
    • Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Bad Axe median household income by gender. You can refer the same here

  13. f

    Data from: On the poor statistical properties of the P-curve meta-analytic...

    • tandf.figshare.com
    zip
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard D. Morey; Clintin P. Davis-Stober (2025). On the poor statistical properties of the P-curve meta-analytic procedure [Dataset]. http://doi.org/10.6084/m9.figshare.29867645.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Richard D. Morey; Clintin P. Davis-Stober
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The P-curve (Simonsohn, Nelson, & Simmons, 2014; Simonsohn, Simmons, & Nelson, 2015) is a widely-used suite of meta-analytic tests advertised for detecting problems in sets of studies. They are based on nonparametric combinations of p values (e.g., Marden, 1985) across significant (p < .05) studies and are variously claimed to detect “evidential value”, “lack of evidential value”, and “left skew” in p values. We show that these tests do not have the properties ascribed to them. Moreover, they fail basic desiderata for tests, including admissibility and monotonicity. In light of these serious problems, we recommend against the use of the P-curve tests.

  14. N

    Bad Axe, MI Age Cohorts Dataset: Children, Working Adults, and Seniors in...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Bad Axe, MI Age Cohorts Dataset: Children, Working Adults, and Seniors in Bad Axe - Population and Percentage Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/bad-axe-mi-population-by-age/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Variables measured
    Population Over 65 Years, Population Under 18 Years, Population Between 18 and 64 Years, Percent of Total Population for Age Groups
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age cohorts. For age cohorts we divided it into three buckets Children ( Under the age of 18 years), working population ( Between 18 and 64 years) and senior population ( Over 65 years). For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Bad Axe population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Bad Axe. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.

    Key observations

    The largest age group was 18 to 64 years with a poulation of 1,739 (57.77% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age cohorts:

    • Under 18 years
    • 18 to 64 years
    • 65 years and over

    Variables / Data Columns

    • Age Group: This column displays the age cohort for the Bad Axe population analysis. Total expected values are 3 groups ( Children, Working Population and Senior Population).
    • Population: The population for the age cohort in Bad Axe is shown in the following column.
    • Percent of Total Population: The population as a percent of total population of the Bad Axe is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Bad Axe Population by Age. You can refer the same here

  15. N

    Bad Axe, MI Annual Population and Growth Analysis Dataset: A Comprehensive...

    • neilsberg.com
    csv, json
    Updated Jul 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Bad Axe, MI Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in Bad Axe from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/bad-axe-mi-population-by-year/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Variables measured
    Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
    Measurement technique
    The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Bad Axe population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Bad Axe across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

    Key observations

    In 2023, the population of Bad Axe was 2,977, a 0.70% decrease year-by-year from 2022. Previously, in 2022, Bad Axe population was 2,998, a decline of 0.63% compared to a population of 3,017 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Bad Axe decreased by 455. In this period, the peak population was 3,432 in the year 2000. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

    Content

    When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

    Data Coverage:

    • From 2000 to 2023

    Variables / Data Columns

    • Year: This column displays the data year (Measured annually and for years 2000 to 2023)
    • Population: The population for the specific year for the Bad Axe is shown in this column.
    • Year on Year Change: This column displays the change in Bad Axe population for each year compared to the previous year.
    • Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Bad Axe Population by Year. You can refer the same here

  16. c

    Bad Idea AI Price Prediction for 2025-09-18

    • coinunited.io
    Updated Aug 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoinUnited.io (2025). Bad Idea AI Price Prediction for 2025-09-18 [Dataset]. https://coinunited.io/en/data/prices/crypto/bad-idea-ai-bad/price-prediction
    Explore at:
    Dataset updated
    Aug 25, 2025
    Dataset provided by
    CoinUnited.io
    Description

    Based on professional technical analysis and AI models, deliver precise price‑prediction data for Bad Idea AI on 2025-09-18. Includes multi‑scenario analysis (bullish, baseline, bearish), risk assessment, technical‑indicator insights and market‑trend forecasts to help investors make informed trading decisions and craft sound investment strategies.

  17. N

    Income Bracket Analysis by Age Group Dataset: Age-Wise Distribution of Bad...

    • neilsberg.com
    csv, json
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Income Bracket Analysis by Age Group Dataset: Age-Wise Distribution of Bad Axe, MI Household Incomes Across 16 Income Brackets // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/f338d7f3-f353-11ef-8577-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bad Axe, Michigan
    Variables measured
    Number of households with income $200,000 or more, Number of households with income less than $10,000, Number of households with income between $15,000 - $19,999, Number of households with income between $20,000 - $24,999, Number of households with income between $25,000 - $29,999, Number of households with income between $30,000 - $34,999, Number of households with income between $35,000 - $39,999, Number of households with income between $40,000 - $44,999, Number of households with income between $45,000 - $49,999, Number of households with income between $50,000 - $59,999, and 6 more
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across 16 income brackets (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out the total number of households within a specific income bracket along with how many households with that income bracket for each of the 4 age cohorts (Under 25 years, 25-44 years, 45-64 years and 65 years and over). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the the household distribution across 16 income brackets among four distinct age groups in Bad Axe: Under 25 years, 25-44 years, 45-64 years, and over 65 years. The dataset highlights the variation in household income, offering valuable insights into economic trends and disparities within different age categories, aiding in data analysis and decision-making..

    Key observations

    • Upon closer examination of the distribution of households among age brackets, it reveals that there are 80(5.86%) households where the householder is under 25 years old, 407(29.82%) households with a householder aged between 25 and 44 years, 447(32.75%) households with a householder aged between 45 and 64 years, and 431(31.58%) households where the householder is over 65 years old.
    • The age group of 25 to 44 years exhibits the highest median household income, while the largest number of households falls within the 45 to 64 years bracket. This distribution hints at economic disparities within the city of Bad Axe, showcasing varying income levels among different age demographics.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Income brackets:

    • Less than $10,000
    • $10,000 to $14,999
    • $15,000 to $19,999
    • $20,000 to $24,999
    • $25,000 to $29,999
    • $30,000 to $34,999
    • $35,000 to $39,999
    • $40,000 to $44,999
    • $45,000 to $49,999
    • $50,000 to $59,999
    • $60,000 to $74,999
    • $75,000 to $99,999
    • $100,000 to $124,999
    • $125,000 to $149,999
    • $150,000 to $199,999
    • $200,000 or more

    Variables / Data Columns

    • Household Income: This column showcases 16 income brackets ranging from Under $10,000 to $200,000+ ( As mentioned above).
    • Under 25 years: The count of households led by a head of household under 25 years old with income within a specified income bracket.
    • 25 to 44 years: The count of households led by a head of household 25 to 44 years old with income within a specified income bracket.
    • 45 to 64 years: The count of households led by a head of household 45 to 64 years old with income within a specified income bracket.
    • 65 years and over: The count of households led by a head of household 65 years and over old with income within a specified income bracket.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Bad Axe median household income by age. You can refer the same here

  18. H

    Replication data for: Effects of algorithmic flagging on fairness:...

    • dataverse.harvard.edu
    Updated Apr 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan TeBlunthuis; Benjamin Mako Hill; Aaron Halfaker (2021). Replication data for: Effects of algorithmic flagging on fairness: Quasi-experimental evidence from Wikipedia [Dataset]. http://doi.org/10.7910/DVN/E0RYJ4
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Nathan TeBlunthuis; Benjamin Mako Hill; Aaron Halfaker
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN/E0RYJ4https://dataverse.harvard.edu/api/datasets/:persistentId/versions/4.1/customlicense?persistentId=doi:10.7910/DVN/E0RYJ4

    Description

    Code overview The code is all in code.tar.gz. Identifying thresholds and cutoffs over time Pretty much all in identify_cutoffs.py. Iterates git repository, parses wmf-config/InitializeSettings.php. Interprets historical json versions. Builds a pandas table of events with threshold configuration settings and some other configuration settings like when different UI elements were enabled. There are some traces of the first attempt at a project, an attempted time series analysis that failed due to high noise. In cases where thresholds are not configured, default thresholds are configured in the ORES service repository mediawiki-extentions-ORES/extension.json (a copy of the git repository is in mediawiki-extensions-ORES.tar.gz. get_default_threshold_strings.py scripts this git repository to get the history of the default thresholds. They don’t change much. Reading server admin log Right now, the code to get the history of deployments is in a chunk at the identify_cutoffs.py. I think I will refactor this to its own file. The precise timing of changes to the models does not come from the source code repository but rather the live deployments. The SAL (server admin log) publishes a history of live deployments. Converting ORES configuration strings to prediction score cutoffs This is done by ores_archaeologist.py. This is by far the most complex complex script and it wraps functionality from the revscoring package (a copy of this repository is in revscoring.tar.gz) to load different versions of models and analyze them. It checks out git commits corresponding to changes in InitializeSettings.php or SAL, installs the correct python dependencies in a helper repository to make sure the models run in as close as possible to the correct environment to ensure the thresholds are correct. helper.py has functions used by ores_archaeologist.py. Sometimes there are errors and we start analyzing data starting after the last error to give a continuous period. get_model_threshold.py is a simple script that is run by ores_archeologist.py and actually loads the revscoring code. The ores_archeologist.py script can also attempt to find historical revision scores. This was not actually used in the paper because these historical scores may not be reliable. revscoring_score_shim.py is analogous to get_model_threshold.py, but for scoring edits. Sampling from Wikimedia history and event table sample_edits_near_thresholds.py is a spark script that runs on the Wikimedia Foundation datalake nad builds the revision dataset. Much of the logic is inspark_functions.py. Fitting models. The master file is fit_10_rdds.R and fit_vlb_rdds.R. During the review cycle we found a bug in the ‘very likely bad’ data and I refit only those models to save time. fit_10_rdds.R just fits the models asynchronously. The main logic is in fit_base_rdds.R and modeling_init.R. The dataset is put together in modeling_init.R. ob_util.R and helper.R have a few miscellanious functions. rdd_defaults.R has the formulas and sets stan modeling parameters. Fit models are available in models.tar.gz. Interpreting models. analyze_threshold_models.R builds smaller dataframes and variables that will be used by the Knitr Latex system to build the paper. analyze_vlb_models.R does the same, but just for the ‘very likely bad’ data. Code shared by both scripts are in analyze_main_models.R. Dataset summary statistics Some additional statistics reported in the paper are calculated in summary_stats.R. Evaluating encoded bias The bias_analysis.tar.gz archive has code and data used for evaluating the bias of the ORES models including a copy of the editquality git repository. A copy of the repository is in editquality.tar.gz. Building the paper and appendix This is in the paper.tar.gz and appendix.tar.gz archives. Data files overview The following data files are published at the top level of the dataverse. Copy them into a data subdirectory to use them with the code. cutoff_revisions_2periods.csv.gz.part1 and cutoff_revisions_2periods.csv.gz.part2 have the full dataset of edits within the neighborhood.You should do cat cutoff_revisions_2periods.csv.gz.part1 cutoff_revisions_2periods.csv.gz.part2 > cutoff_revisions_2periods.csv.gz and then decompress the output to get the full csv. cutoff_revisions_sample.csv and cutoff_revisions_sample_vlbfix.csv have the sampled datasets on which the models are fit. threshold_strata_counts.csv and threshold_strata_counts_vlbfix.csv have the counts from statified sampling which are used to calculate modeling weights. What does vlb_fix mean? The original submission of the paper contained a bug that affected the sample at the verylikelybad RCFilters threshold. The bug was on line 241 of sample_edits_near_threshold.py and lead to NA values in the sample which would have affected the sample weights. During the revise-and-resubmit process we found and fixed the bug and fit new models at the verylikelybad threshold. LICENSE The data in this repository is...

  19. f

    Data from: Search for Correlations Between the Results of the Density...

    • acs.figshare.com
    zip
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saadiallakh Normatov; Pavel V. Nesterov; Timur A. Aliev; Alexandra A. Timralieva; Alexander S. Novikov; Ekaterina V. Skorb (2025). Search for Correlations Between the Results of the Density Functional Theory and Hartree–Fock Calculations Using Neural Networks and Classical Machine Learning Algorithms [Dataset]. http://doi.org/10.1021/acsomega.4c09861.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    ACS Publications
    Authors
    Saadiallakh Normatov; Pavel V. Nesterov; Timur A. Aliev; Alexandra A. Timralieva; Alexander S. Novikov; Ekaterina V. Skorb
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This work proposes several machine learning models that predict B3LYP-D4/def-TZVP outputs from HF-3c outputs for supramolecular structures. The data set consists of 1031 entries of dimer, trimer, and tetramer cyclic structures, containing both molecules with heteroatoms in the ring and without. Six quantum chemistry descriptors and features are calculated by using both computational methods: Gibbs energy, electronic energy, entropy, enthalpy, dipole moment, and band gap. Statistical analysis shows a good correlation between energy properties and bad correlation only for the dipole moment. Machine learning models are separated into three groups: linear, tree-based, and neural networks. The best models for the prediction of density functional theory features are LASSO for linear, XGBoost for tree-based, and single-layer perceptron for neural networks with energy-related features having the best prediction values and dipole moment having the worst.

  20. Wine_Test Prediction | 1600 data | yashaswi

    • kaggle.com
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayushman Yashaswi (2025). Wine_Test Prediction | 1600 data | yashaswi [Dataset]. https://www.kaggle.com/datasets/ayushmanyashaswi/wine-test-prediction-1600-data-yashaswi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ayushman Yashaswi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Sure! Here's the updated Kaggle dataset description with your data visualization work included:

    📊 Wine Quality - Red Wine Dataset

    This dataset contains physicochemical attributes of red variants of Portuguese "Vinho Verde" wine, along with their quality score (rated between 0 to 10). The goal is to predict wine quality using various classification models based on the chemical properties of the wine.

    🧪 Features Overview (12 columns):

    • fixed acidity: most acids involved with wine are fixed/nonvolatile
    • volatile acidity: amount of acetic acid (can affect taste)
    • citric acid: adds freshness and flavor
    • residual sugar: sugar left after fermentation
    • chlorides: salt content
    • free sulfur dioxide: protects wine from microbes
    • total sulfur dioxide: total SO₂ content
    • density: wine density
    • pH: acidity level
    • sulphates: preservative and antimicrobial
    • alcohol: alcohol percentage
    • quality (target): wine quality score (0–10)

    🤖 Model Performance Summary:

    Multiple machine learning models were trained to predict wine quality. The following accuracy scores were observed:

    ModelTraining AccuracyTesting Accuracy
    Logistic Regression87.91%87.0%
    Random Forest100%94.0%
    Decision Tree100%88.5%
    Support Vector Machine (SVM)86.41%86.5%

    📈 Data Visualization:

    A comparison plot of model performance was created to visually represent the accuracy of each algorithm. This helps in understanding which models generalized well and which ones may have overfit to the training data.

    📁 File Info:

    • Filename: winequality-red.csv
    • Size: ~100 KB
    • Rows: 1,599
    • Columns: 12

    📌 Ideal For:

    • Classification model evaluation
    • Feature correlation analysis
    • EDA and visualization
    • ML model tuning and comparison
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2016). Poor data quality causes among enterprises in North America 2015 [Dataset]. https://www.statista.com/statistics/518069/north-america-survey-enterprise-poor-data-quality-reasons/
Organization logo

Poor data quality causes among enterprises in North America 2015

Explore at:
Dataset updated
Jan 26, 2016
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2015
Area covered
United States, Canada
Description

The statistic depicts the causes of poor data quality for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, 47 percent of respondents indicated that poor data quality at their company was attributable to data migration or conversion projects.

Search
Clear search
Close search
Google apps
Main menu