Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The dataset contains the data collected in a user study carried out to evaluate the impact of using domain knowledge, ontologies, in the creation of global post-hoc explanations of black-box models.
The research hypothesis was that the use of ontologies could enhance the understandability of explanations by humans.
To validate this research hypothesis we ran a user study where participants were asked to carry out several tasks. In each task, the answers, time of response, and user understandability and confidence were collected and measured.
The data analysis revealed that the use of ontologies do enhance the understandability of explanations of black-box models by human users, in particular, in the form of decision trees explaining artificial neural networks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
When learners self-explain, they try to make sense of new information. Although research has shown that bodily actions and written notes are an important part of learning, previous analyses of self-explanations rarely take into account written and non-verbal data produced spontaneously. In this paper, the extent to which interpretations of self-explanations are influenced by the systematic consideration of such data is investigated. The video recordings of 33 undergraduate students, who learned with worked-out examples dealing with complex numbers, were categorized successively including three different data bases: (a) verbal data, (b) verbal and written data, and (c) verbal, written and non-verbal data. Results reveal that including written data (notes) and non-verbal data (gestures and actions) leads to a more accurate analysis of self-explanations than an analysis solely based on verbal data. This influence is even stronger for the categorization of self-explanations as adequate or inadequate.
This paper provides evidence on child penalties in female and male earnings in different countries. The estimates are based on event studies around the birth of the first child, using the specification proposed by Kleven et al. (2018). The analysis reveals some striking similarities in the qualitative effects of children across countries, but also sharp differences in the magnitude of the effects. We discuss the potential role of family policies (parental leave and childcare provision) and gender norms in explaining the observed differences.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results of an online questionnaire to assess the end-users' need for explanations in software systems. The questionnaire was shared in December 2018 and was online until January 2019. 171 participants initiate the survey and 107 completed it. We just analyzed the responses of the participants who completed the survey.
This submission contains:
The survey raw data in CSV format, separated by comma values;
The .xlsx file containing the same raw data;
The .pdf file containing the survey questions;
A .rtfd version of the survey questions;
A .html version of the survey questions;
The .xlsx file containing the analyzed data;
The .pdf file containing instructions about the coded data.
The raw data contains only the responses from the 107 participants who completed the survey. Blank cells indicate that the participant did not provide a response to the corresponding question or answer option.
All responses are anonymyzed and identified by an unique ID.
Each row is identified by the participant's ID, the date when the questionnaire was submitted, the last page (18 in total) and the language that the participant chose.
The subsequent columns contain the questions.
We use codes before each question. First, one of the following symbols:
(*) as an indication that the question was mandatory;
(*+)as an indication that the question was mandatory but was conditionnally shown, depending on previous answers;
(+) as an indication that the question was conditionally shown, depending on previous answers;
Next, the code of the question as in the questionnaire.
And, if multiple choice, the code of the answer option.
E.g.: (*+)A2(3) means that the A2 question in the questionnaire was mandatory and conditionally shown, and that this column contains the responses regarding answer option 3.
After this code, the question as on the original questionnaire is shown and, when multiple option answer, the corresponding option is shown between [] after the question. E.g.: "In a typical day, which category of software/apps do you use on your digital devices most often? (More than one allowed) [Games]", where Games was one of the optional answers.
The questionnaire was available in three languages: Portuguese, German and English.
Responses in German and Portuguese were translated to English. These translations are shown in a subsequent column, beside the column with the original responses, and are identified by the word "TRANSLATION" in the title. Responses which were already in English were not translated.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the dataset which was created during a user study on evaluation of explainability of artificial intelligence (AI) at the Jagielloninan University as a collaborative work of computer science (GEIST team) and information sciences research groups. The main goal of the research was to explore effective explanations of AI model patterns to diverse audiences.
The dataset contains material collected from 39 participants during the interviews conducted by the Information Sciences research group. The participants were recruited from 149 candidates to form three groups that represented domain experts in the field of mycology (DE), students with data science and visualization background (IT) and students from social sciences and humanities (SSH). Each group was given an explanation of a machine learning model trained to predict edible and non-edible mushrooms and asked to interpret the explanations and answer various questions during the interview. The machine learning model and explanations for its decision were prepared by the computer science research team.
The resulting dataset was constructed from the surveys obtained from the candidates, anonymized transcripts of the interviews, the results from thematic analysis, and original explanations with modifications suggested by the participants. The dataset is complemented with the source code allowing one to reproduce the initial machine leaning model and explanations.
The general structure of the dataset is described in the following table. The files that contain in their names [RR]_[SS]_[NN] contain the individual results obtained from particular participant. The meaning of the prefix is as follows:
File | Description |
SURVEY.csv | The results from a survey that was filled by 149 participants out of which 39 were selected to form a final group of particiapnts. |
SURVEY_en.csv | Content of the SURVEY translated into English. |
CODEBOOK.csv | The codebook used in thematic analysis and MAXQDA coding |
QUESTIONS.csv | List of questions that the participants were asked during interviews. |
SLIDES.csv | List of slides used in the study with their interpretation and reference to MAXQDA themes and VISUAL_MODIFICATIONS tables. |
MAXQDA_SUMMARY.csv | Summary of thematic analysis performed with codes used in CODEBOOK for each participant |
PROBLEMS.csv | List of problems that participants were asked to solve during interviews. They correspond to three instances from the dataset that the participants had to classify using knowledge gained from explanations. |
PROBLEMS_en.csv | Content of the PROBLEMS file translated into English. |
PROBLEMS_RESPONSES.csv | The responses to the problems for each participant to the problems listed in PROBLEMS.csv |
VISUALIZATION_MODIFICATIONS.csv | Information on how the order of the slides was modified by the participant, which slides (explanations) were removed, and what kind of additional explanation was suggested. |
ORIGINAL_VISUZALIZATIONS.pdf | The PDF file containing the visualization of explanations presented to the participants during the interviews |
ORIGINAL_VISUZALIZATIONS_EN.pdf | Content of the ORIGINAL_VISUZALIZATIONS translated into English. |
VISUALIZATION_MODIFICATIONS.zip | The PDF file containing the original slides from ORIGINAL_VISUZALIZATIONS.pdf with the modifications suggested by the participant. Each file is a PDF file named with the participant ID, i.e. [RR]_[SS]_[NN].pdf |
TRANSCRIPTS.zip | The anonymized transcripts of interviews for each given participant, zipped into one archive. Each transcript is named after the particiapnt ID, i.e. [RR]_[SS]_[NN].csv and contains text tagged with slide number that it related to, question number from QUESTIONS.csv, and problem number from PROBLEMS.csv. |
The detailed structure of the files presented in the previous Table is given in the Technical info section.
The source code used to train ML model and to generate explanations is available on Gitlab
Repeated Study data for the Pasta Box Task and Cup Transfer Task. Matlab structures containing means and standard deviations for each participant, as well as overall means and standard deviations are included for each task. Details on how to navigate the .mat files can be found in the file "Matlab Data Explanations.txt"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes the software code that was developed for the publication titled "Integrity-based Explanations for Fostering Appropriate Trust in AI Agents".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
No description was included in this Dataset collected from the OSF
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
(:unav)...........................................
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This comprehensive dataset contains all the data and plots presented in the paper"Analysis and multi-objective optimisation of wind turbine torque control strategies". Every data, graph, chart or plot presented within the paper is included in its original form within this dataset. The dataset encompasses a wide variety of data types, including but not limited to:
To enhance usability, each folder contains supplementary documentation, including:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Using Panel Study of Income Dynamics (PSID) microdata over the 1980-2010 period, we provide new empirical evidence on the extent of and trends in the gender wage gap, which declined considerably during this time. By 2010, conventional human capital variables taken together explained little of the gender wage gap, while gender differences in occupation and industry continued to be important. Moreover, the gender pay gap declined much more slowly at the top of the wage distribution than at the middle or bottom and by 2010 was noticeably higher at the top. We then survey the literature to identify what has been learned about the explanations for the gap. We conclude that many of the traditional explanations continue to have salience. Although human-capital factors are now relatively unimportant in the aggregate, women's work force interruptions and shorter hours remain significant in high-skilled occupations, possibly due to compensating differentials. Gender differences in occupations and industries, as well as differences in gender roles and the gender division of labor remain important, and research based on experimental evidence strongly suggests that discrimination cannot be discounted. Psychological attributes or noncognitive skills comprise one of the newer explanations for gender differences in outcomes. Our effort to assess the quantitative evidence on the importance of these factors suggests that they account for a small to moderate portion of the gender pay gap, considerably smaller than, say, occupation and industry effects, though they appear to modestly contribute to these differences.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This comprehensive dataset contains all the data and plots presented in the Doctoral Thesis, organised into separate folders corresponding to each thesis chapter. This arrangement allows for easy access to specific sections of interest, facilitating focused research and analysis. Every data, graph, chart or plot presented within the thesis is included in its original form within this dataset. The dataset encompasses a wide variety of data types, including but not limited to:
To enhance usability, each folder contains supplementary documentation, including:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
(:unav)...........................................
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset includes three files. Descriptions of the files are given as follows: FILENAME: PubMed_retracted_publication_full_v3.tsv - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. FILENAME: 613_knowingly_post_retraction_cit.tsv - The 613 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. FILENAME: Annotation manual.pdf - The manual for annotating the citation purposes in column 11) of the 613_knowingly_post_retraction_cit.tsv.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Quality appraisal of the qualitative studies (Framework for Assessing the Quality of Qualitative Research Evidence[46]).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data files accompany a study examining the impact of corrections and epistemic explanations on trust in the author, article, and journalism in authentic newspaper articles on scientific topics. Participants (N=178) were randomly assigned to one of four groups in a 2x2 factorial design, varying in the presentation of corrections and epistemic explanations. Data was collected online through Qualtrics.The files include: (a) the raw dataset (.csv), (b) the cleaned dataset (.sav), (c) the analysis output (.spv), and (d) thematic analysis of open-ended responses (.xlsx).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Clustering is an unsupervised machine learning technique whose goal is to cluster unlabeled data. But traditional clustering methods only output a set of results and do not provide any explanations of the results. Although in the literature a number of methods based on decision tree have been proposed to explain the clustering results, most of them have some disadvantages, such as too many branches and too deep leaves, which lead to complex explanations and make it difficult for users to understand. In this paper, a hypercube overlay model based on multi-objective optimization is proposed to achieve succinct explanations of clustering results. The model designs two objective functions based on the number of hypercubes and the compactness of instances and then uses multi-objective optimization to find a set of nondominated solutions. Finally, an Utopia point is defined to determine the most suitable solution, in which each cluster can be covered by as few hypercubes as possible. Based on these hypercubes, an explanations of each cluster is provided. Upon verification on synthetic and real datasets respectively, it shows that the model can provide a concise and understandable explanations to users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains all the data and code to reproduce the results and figures in both the manuscript ("Quantifying the impact of AI recommendations with explanations on prescription decisions: an interactive 3 vignette study" by Myura Nagendran, Paul Festor, Matthieu Komorowski, Anthony Gordon, Aldo Faisal) and the supplementary appendices. For more details about the project structure please check the README file in the root of the folder. A data dictionary is also provided in the data folder for the four raw csv data files. The two jupyter notebooks to (1) prepare the data and (2) generate all results and figures should be run sequentially. Please see README within the zip file for more details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Symbols used in this study and their corresponding explanations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reader study was conducted in 3 phases: no ai (phase 1), ai support (phase 2), xai support (phase 3). The data generated in each phase is available in metadata_phase1.csv, metadata_phase2.csv, and metadata_phase3.csv.There are 113 unique participants in phase 1. Additional participants were added in phase 2, resulting in 116 unique clinicians in phases 2 and 3.Important: The 3rd and 13th image in each group are identical. Be careful when performing table joins as the duplicate image_ids can affect them. In metadata_phase1.csv, the AI predictions for the 13th image in each group are null. Please take that into account when performing analysis. In metadata_phase2 and metadata_phase3, the AI predictions for the repeating images are not omitted.participant: Each clinician was assigned a participant Id represented by the participant column.group: Each clinician was randomly assigned to a group. Each group was assigned mutually exclusive sets of images.mask: An internal identifier used for the images. Can be ignored.benign_malignant: ground truth diagnosis.prediction: Diagnosis chosen by the clinician. 1 represents melanoma, 0 represents nevus, 0.5 represents a nevus diagnosis but the clinician chose to excise.confidence: Confidence value entered by the clinician.trust: Trust value entered by the clinician.AI_prediction: Diagnosis predicted by the AI. 1 represents melanoma and 0 represents nevus. language: Language chosen by the clinician.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The dataset contains the data collected in a user study carried out to evaluate the impact of using domain knowledge, ontologies, in the creation of global post-hoc explanations of black-box models.
The research hypothesis was that the use of ontologies could enhance the understandability of explanations by humans.
To validate this research hypothesis we ran a user study where participants were asked to carry out several tasks. In each task, the answers, time of response, and user understandability and confidence were collected and measured.
The data analysis revealed that the use of ontologies do enhance the understandability of explanations of black-box models by human users, in particular, in the form of decision trees explaining artificial neural networks.