Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was developed as part of a study that assessed data reuse. Through bibliometric analysis, corresponding authors of highly cited papers published in 2015 at the University of Illinois at Urbana-Champaign in nine STEM disciplines were identified and then surveyed to determine if data were generated for their article and their knowledge of reuse by other researchers. Second, the corresponding authors who cited those 2015 articles were identified and surveyed to ascertain whether they reused data from the original article and how that data was obtained. The project goal was to better understand data reuse in practice and to explore if research data from an initial publication was reused in subsequent publications.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/37071/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37071/terms
This study explores the factors that influence the data reuse behaviors of scientists and identifies the generalized patterns that occur in data reuse across various disciplines. An online survey was distributed to the scientists through Qualtrics. The initial email invitation to the survey was sent to 15,703 scientists within academic institutions on October 5, 2015, with a reminder sent on November 10, 2015. The survey closed on November 30, 2015. 1,987 email messages (12.65%) were returned and a total of 13,716 participants (87.35%) received the email invitation to participate in the survey. This research used the National Science Foundation (NSF) STEM discipline codes (2014) for the respondents to indicate their specific academic disciplines based on their current research activities. Of these participants, 1,528 scientists from 94 specific disciplines (as categorized by NSF STEM discipline codes (2014)), completed the survey with less than 5% of missing values.
Facebook
TwitterThe incorporation of data sharing into the research lifecycle is an important part of modern scholarly debate. In this study, the DataONE Usability and Assessment working group addresses two primary goals: To examine the current state of data sharing and reuse perceptions and practices among research scientists as they compare to the 2009/2010 baseline study, and to examine differences in practices and perceptions across age groups, geographic regions, and subject disciplines. We distributed surveys to a multinational sample of scientific researchers at two different time periods (October 2009 to July 2010 and October 2013 to March 2014) to observe current states of data sharing and to see what, if any, changes have occurred in the past 3–4 years. We also looked at differences across age, geographic, and discipline-based groups as they currently exist in the 2013/2014 survey. Results point to increased acceptance of and willingness to engage in data sharing, as well as an increase in actual data sharing behaviors. However, there is also increased perceived risk associated with data sharing, and specific barriers to data sharing persist. There are also differences across age groups, with younger respondents feeling more favorably toward data sharing and reuse, yet making less of their data available than older respondents. Geographic differences exist as well, which can in part be understood in terms of collectivist and individualist cultural differences. An examination of subject disciplines shows that the constraints and enablers of data sharing and reuse manifest differently across disciplines. Implications of these findings include the continued need to build infrastructure that promotes data sharing while recognizing the needs of different research communities. Moving into the future, organizations such as DataONE will continue to assess, monitor, educate, and provide the infrastructure necessary to support such complex grand science challenges.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Keynote presentation by John Burn-Murdoch, Senior Data-Visualisation Journalist, from Financial Times presented at Better Science through Better Data event. The video recording and scribes are included.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset with results of the poll conducted in the study “Information Scientists’ Motivations for Research Data Sharing and Reuse”.
In terms of the Uses and Gratifications Theory (Questions 1 and 2), the most popular uses relate to the categories of research support and information. Researchers share, or would share, their research data in general for any reusability purposes and especially for combination of different datasets to produce new evidence. Also, the vast majority of study participants associate research data sharing with possibilities to accelerate scientific progress and to increase research efficiency. In case of research data reuse, all the researchers indicated that they use, or would use, others’ data first of all for inspiration. Interestingly, study participants put relatively high the category of recognition in case of sharing, but at the same time they do not associate increased recognition among colleagues and other researchers with research data reuse. The remaining categories belonging to the categories of self-esteem and social interaction, i.e. increased citation level and visibility of the research as well as enhanced scientific reputation, possible cooperations and co-authorship, were selected only by few respondents. Also remarkably, data reuse is more frequently linked to entertainment then data sharing.
In terms of the Self-Determination Theory (Questions 3 and 4), all but one of the interviewees indicated that they have shared or would share their research data because it can accelerate scientific progress which they consider important and would like to contribute to it (i.e., identified regulation). The second most popular motivation turned out to be the obligation by employer, project funder and/or journals (i.e., external regulation). The third most popular option was social influence, i.e. because many other researchers participate in data sharing and they feel obligated to do the same (i.e., external regulation).This way, the participants demonstrate a mixture of identified motivation and external regulation, both material and social. In the case of data reuse, the participants demonstrate more homogeneous results with identification and intrinsic motivation having most of the votes. The role of external regulation seems to be much less important as in the case with data sharing. So, researchers reuse, or would reuse, research data because it can accelerate scientific progress which is important for them. Additionally, researchers enjoy exploring and using third party research data. Thus, interviewees participate or would participate in data sharing because they consider it important, but also feel or are obliged to do so. At the same time, study participants do not feel pressure from outside when deciding whether to reuse data or not.
For more information about the study and its results, please read the article “Information Scientists’ Motivations for Research Data Sharing and Reuse” by Shutsko and Stock (2023).
Facebook
TwitterDataSeer/science-data-reuse-lm-markdown dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
Project Overview Trends toward open science practices, along with advances in technology, have promoted increased data archiving in recent years, thus bringing new attention to the reuse of archived qualitative data. Qualitative data reuse can increase efficiency and reduce the burden on research subjects, since new studies can be conducted without collecting new data. Qualitative data reuse also supports larger-scale, longitudinal research by combining datasets to analyze more participants. At the same time, qualitative research data can increasingly be collected from online sources. Social scientists can access and analyze personal narratives and social interactions through social media such as blogs, vlogs, online forums, and posts and interactions from social networking sites like Facebook and Twitter. These big social data have been celebrated as an unprecedented source of data analytics, able to produce insights about human behavior on a massive scale. However, both types of research also present key epistemological, ethical, and legal issues. This study explores the issues of context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership, with a focus on data curation strategies. The research suggests that connecting qualitative researchers, big social researchers, and curators can enhance responsible practices for qualitative data reuse and big social research. This study addressed the following research questions: RQ1: How is big social data curation similar to and different from qualitative data curation? RQ1a: How are epistemological, ethical, and legal issues different or similar for qualitative data reuse and big social research? RQ1b: How can data curation practices such as metadata and archiving support and resolve some of these epistemological and ethical issues? RQ2: What are the implications of these similarities and differences for big social data curation and qualitative data curation, and what can we learn from combining these two conversations? Data Description and Collection Overview The data in this study was collected using semi-structured interviews that centered around specific incidents of qualitative data archiving or reuse, big social research, or data curation. The participants for the interviews were therefore drawn from three categories: researchers who have used big social data, qualitative researchers who have published or reused qualitative data, and data curators who have worked with one or both types of data. Six key issues were identified in a literature review, and were then used to structure three interview guides for the semi-structured interviews. The six issues are context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership. Participants were limited to those working in the United States. Ten participants from each of the three target populations—big social researchers, qualitative researchers who had published or reused data, and data curators were interviewed. The interviews were conducted between March 11 and October 6, 2021. When scheduling the interviews, participants received an email asking them to identify a critical incident prior to the interview. The “incident” in critical incident interviewing technique is a specific example that focuses a participant’s answers to the interview questions. The participants were asked their permission to have the interviews recorded, which was completed using the built-in recording technology of Zoom videoconferencing software. The author also took notes during the interviews. Otter.ai speech-to-text software was used to create initial transcriptions of the interview recordings. A hired undergraduate student hand-edited the transcripts for accuracy. The transcripts were manually de-identified. The author analyzed the interview transcripts using a qualitative content analysis approach. This involved using a combination of inductive and deductive coding approaches. After reviewing the research questions, the author used NVivo software to identify chunks of text in the interview transcripts that represented key themes of the research. Because the interviews were structured around each of the six key issues that had been identified in the literature review, the author deductively created a parent code for each of the six key issues. These parent codes were context, data quality and trustworthiness, data comparability, informed consent, privacy and confidentiality, and intellectual property and data ownership. The author then used inductive coding to create sub-codes beneath each of the parent codes for these key issues. Selection and Organization of Shared Data The data files consist of 28 of the interview transcripts themselves – transcripts from Big Science Researchers (BSR), Data Curators (DC), and Qualitative Researchers (QR)...
Facebook
TwitterWe designed and organized a one-day workshop, where in the context of FAIR the following themes were discussed and practiced: scientific transparency and reproducibility; how to write a README; data and code licenses; spatial data; programming code; examples of published datasets; data reuse; and discipline and motivation. The intended audience were researchers at the Environmental Science Group of Wageningen University and Research. All workshop materials were designed with further development and reuse in mind and are shared through this dataset.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data reuse.
Facebook
TwitterFull edition for scientific use. As part of a study on factors influencing researcher data reuse and the mechanisms by which these factors are activated, the research team conducted semi-structured oral interviews with a purposive sample of 24 data reusers and intermediaries. This dataset includes de-identified transcripts of 21 of the interviews, as well as written follow-up responses from 8 of the study participants.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress. This article aims to provide an in-depth examination of the practices and perceptions of data management, including data storage, data sharing, and data use and reuse by scientists around the world. Methods: The Usability and Assessment Working Group of DataONE, an NSF-funded environmental cyberinfrastructure project, distributed a survey to a multinational and multidisciplinary sample of scientific researchers in a two-waves approach in 2017-2018. We focused our analysis on examining the differences across age groups, sub-disciplines of science, and sectors of employment. Findings: Most respondents displayed what we describe as high and moderate risk data practices by storing their data on their personal computer, departmental servers or USB drives. Respondents appeared to be satisfied with short-term storage solutions; however, only half of them are satisfied with available mechanisms for storing data beyond the life of the process. Data sharing and data reuse were viewed positively: over 85% of respondents admitted they would be willing to share their data with others and said they would use data collected by others if it could be easily accessed. A vast majority of respondents felt that the lack of access to data generated by other researchers or institutions was a major impediment to progress in science at large, yet only about a half thought that it restricted their own ability to answer scientific questions. Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and future reuse. Assistance through data managers or data librarians, readily available data repositories for both long-term and short-term storage, and educational programs for both awareness and to help engender good data practices are clearly needed.
Facebook
TwitterThe DDL maintains data on articles referencing the DDL since it was formally established in 2014. Details include article citations, DDL site or data asset citations, and data asset availability statements, in addition to codes indicating whether specific data assets are referenced and whether data is referenced in a citation, which may indicate data reuse. This data asset is updated quarterly.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To generate the bibliographic and survey data to support a data reuse study conducted by several Library faculty and accepted for publication in the Journal of Academic Librarianship, the project team utilized a series of web-based online scripts that employed several different endpoints from the Scopus API. The related dataset: "Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University" contains survey design and results.
1) getScopus_API_process_dmp_IDB.asp: used the search API query the Scopus database API for papers by UIUC authors published in 2015 -- limited to one of 9 pre-defined Scopus subject areas -- and retrieve metadata results sorted highest to lowest by the number of times the retrieved articles were cited. The URL for the basic searches took the following form: https://api.elsevier.com/content/search/scopus?query=(AFFIL%28(urbana%20OR%20champaign) AND univ*%29) OR (AF-ID(60000745) OR AF-ID(60005290))&apikey=xxxxxx&start=" & nstart & "&count=25&date=2015&view=COMPLETE&sort=citedby-count&subj=PHYS
Here, the variable nstart was incremented by 25 each iteration and 25 records were retrieved in each pass. The subject area was renamed (e.g. from PHYS to COMP for computer science) in each of the 9 runs. This script does not use the Scopus API cursor but downloads 25 records at a time for up to 28 times -- or 675 maximum bibliographic records. The project team felt that looking at the most 675 cited articles from UIUC faculty in each of the 9 subject areas was sufficient to gather a robust, representative sample of articles from 2015. These downloaded records were stored in a temporary table that was renamed for each of the 9 subject areas.
2) get_citing_from_surveys_IDB.asp: takes a Scopus article ID (eid) from the 49 UIUC author returned surveys and retrieves short citing article references, 200 at a time, into a temporary composite table. These citing records contain only one author, no author affiliations, and no author email addresses. This script uses the Scopus API cursor=* feature and is able to download all the citing references of an article 200 records at a time.
3) put_in_all_authors_affil_IDB.asp: adds important data to the short citing records. The script adds all co-authors and their affiliations, the corresponding author, and author email addresses.
4) process_for_final_IDB.asp: creates a relational database table with author, title, and source journal information for each of the citing articles that can be copied as an Excel file for processing by the Qualtrics survey software. This was initially 4,626 citing articles over the 49 UIUC authored articles, but was reduced to 2,041 entries after checking for available email addresses and eliminating duplicates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file collection is part of Data reuse in the Social Sciences and Humanities: Project report of the SWITCH Innovation Lab “Repositories & Data Quality” (doi 10.21256/zhaw-2404). This project ran from October 2020 until February 2021 as a collaboration between SWITCH and ZHAW Zurich University of Applied Sciences. The report gives an overview on the relevant data sources for researchers in the social sciences and humanities (SSH) in Switzerland and the criteria they apply when choosing suitable data sources.
Further information is given in the corresponding report:
Data reuse in the social sciences and humanities : project report of the SWITCH Innovation Lab “Repositories & Data Quality”. Winterthur : ZHAW Zurich University of Applied Sciences, 2021. Available at: https://doi.org/10.21256/zhaw-2404
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is related to our ICCAD work "Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs".
Facebook
TwitterThe aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. The respondents were professors of human sciences, social sciences and behavioural sciences in Finnish universities, and representatives of some research institutes. Opinions were also queried on the OECD guidelines and principles on open access to research data from public funding. First, the respondents were asked whether there were any guidelines or regulations concerning the depositing of digital research data in their departments, what happened to research data after the completion of the original research, and to what extent the data were reused. Further questions covered how often the data from completed research projects were reused in secondary research projects or for theses. The respondents also estimated what proportion of the data collected in their departments/institutes were reusable at the time of the survey, and why research data were not being reused in their own field of research. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse. Opinions on the OECD Open Access guidelines on research data were queried. The respondents were asked whether they had earlier knowledge of the guidelines, and to what extent its principles could be implemented in their own disciplines. Some questions pertained to the advantages and disadvantages of open access to research data. The advantages mentioned included reducing duplicate data collection and more effective use of data resources, whereas the disadvantages mentioned included, for example, risks connected to data protection and misuse of data. The respondents also suggested ways of implementing the Open Access guidelines and gave their opinions on how binding the recommendations should be, to what extent various bodies should be involved in formulating the guidelines, and how the archiving and dissemination of digital research data should be organised. Finally, the respondents estimated how the researchers in their field would react to enhancing open access to research data, and also gave their opinion on open access to the data they themselves have collected. Background variables included the respondent's gender, university, and research field.
Facebook
TwitterFunding agencies are reluctant to support data archiving, even though large research funders such as the National Science Foundation (NSF) and the National Institutes of Health acknowledge its importance for scientific progress. Our quantitative estimates of data reuse indicate that ongoing financial investment in data-archiving infrastructure provides a high scientific return.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1 χ2 = 19.082, p = .014;2 χ2 = 29.320, p = .000.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was developed as part of a study that assessed data reuse. Through bibliometric analysis, corresponding authors of highly cited papers published in 2015 at the University of Illinois at Urbana-Champaign in nine STEM disciplines were identified and then surveyed to determine if data were generated for their article and their knowledge of reuse by other researchers. Second, the corresponding authors who cited those 2015 articles were identified and surveyed to ascertain whether they reused data from the original article and how that data was obtained. The project goal was to better understand data reuse in practice and to explore if research data from an initial publication was reused in subsequent publications.