CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The position of an author on the byline of a paper affects the inferences readers make about their contributions to the research. We examine gender differences in authorship in the ecology literature using two datasets: submissions to six journals between 2010 and 2015 (regardless of whether they were accepted), and manuscripts published by 151 journals between 2009 and 2015. Women were less likely to be last (i.e., 'senior') authors (averaging ~23% across journals, years and datasets) and sole authors (~24%), but more likely to be first author (~38%), relative to their overall frequency of authorship (~31%). However, the proportion of women in all authorship roles, except sole authorship, has increased year-on-year. Women were less likely to be authors on papers with male last authors, and all-male papers were more abundant than expected given the overall gender ratio. Women were equally-well represented on papers published in higher versus lower impact factor journals at all authorship positions. Female first authors were less likely to serve as corresponding author of their papers; this difference increased with the degree of gender inequality in the author's home country, but did not depend on the gender of the last author. First authors from non-English speaking countries were less likely to serve as corresponding author of their papers, especially if the last author was from an English-speaking country. That women more often delegate corresponding authorship to one of their coauthors may increase the likelihood that readers undervalue their role in the research by shifting credit for their contributions to coauthors. We suggest that author contribution statements be more universally adopted and that these statements declare how and/or why the corresponding author was selected for this role.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annual authorship trends for broad and narrow Scopus fields from the paper: Research Co-authorship 1900-2020: Continuous, universal, and ongoing expansion. The figures from the paper are also included, as well as a supplementary analysis of Journal for ImmunoTherapy of Cancer and additional figures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data support the infographics found in our article, "Collaboration in Science Annual Report: New Data," found on the AJE Author Resource Center at http://www.aje.com/en/arc/research-collaboration-2016/. In particular, we focused on the top ten pairs of collaboration countries globally in 2016 (based on PubMed author affiliations), as well as specific lists for China, Japan, South Korea, Brazil, and Europe
This dataset contains data on authors, publications, and co-author percent contributions to publications of Polish habilitation degree applicants, collected from publicly available files from https://www.ck.gov.pl/promotion/type/l.html and https://radon.nauka.gov.pl/dane/pliki-postepowania-awansowe and various university websites. Data was extracted from PDF documents for scientific research into co-author name ordering and relative input contributions of co-authors in mathematics. The datset is an Excel file with the following columns: source: source websitedomain: scientific domain according to sourcediscipline: scientific discipline according to sourcefamily_name_on_list: applicant family name as listed in sourcegiven_name_on_list: applicant given name as listed in sourcefull_name_on_langing_page: full name according to source, taken from the habilitation proceedings landing pagesubmission_year: year of submission of applicationapplication_number: ID number of proceeding, either as given or extracted from landing page URL, depending on sourcenumerical_percent_contribution_statements_found: whether any numerical percent contribution statements could be found in the self-report PDF. values: YES/NOnum_works_listed: number of works listed for the habilitation publications cyclenum_works_coauthored: number of works of the above which were co-authoredinternal_work_number: a counter to identify work within applications, can go beyond the works belonging to the habilitation cyclereference_in_application: for works with perent contribution data, the bibliographic reference as given in the self-reportnum_coauthors: number of co-authors of a publicationauthor_position: position of the name in the author list of a co-author with a numerical percent contribution, first author would have value 1, second author 2, etc.contribution_percent: the indicated numerical percent contribution, a value up to 100, in percentnote: additional information about source of information when found on university website and percent contribution for works which were not part of habilitation works cyclealphabetical_order: whether the author list was in alphabetical name order. values: YES/NOcontribution_for_applicant: whether the percent contribution value of a co-author is for the habilitation applicant or not. values: YES/NO
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Publication reference:
Donner, P. (2020). A validation of co-authorship credit models with empirical data from the contributions of PhD candidates. Quantitative Science Studies, v. 1, i. 2, p. 551-564. https://doi.org/10.1162/qss_a_00048.
The file contains one row per authorship contribution statement. Rows of publications and theses are grouped.
Description of columns:
dissertation_id - an integer identifying each dissertation thesis
university - university at which the dissertation thesis was written and PhD degree conferred
year - publication year of the dissertation thesis
author - dissertation thesis author name
title - dissertation thesis title
subject - the field of research
publication_id - an integer identifying each publication; publication associated with more than one thesis have the same id across theses
reference - bibliographic reference for the publication associated with the thesis
author_count - number of authors of the publication
author_position - position in the author byline of the credited author
credit - claimed credit of the author in percent
corresponding_author - flag for whether the publication author of this row is a orresponding author
Data for paper "Recommending Scientific Datasets Using Author Networks in Ensemble Methods" which is accepted by Data Science Journal. These data contains 1)MAKG (Microsoft Academic Knowledge Graph) co-author network (HDT/RDF format), 2)MAKG paper/dataset title collection (HDT/RDF format), 3) MAKG paper/dataset abstract collection (HDT/RDF format).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This individual-level data-set describes the most productive European Union (EU) researchers (in terms of articles), during 2007 - 2018, irrespective of their research field. Specifically, in the data-set file, i.e. "iconic_5000", we profile the most productive 4,588 EU researchers using the following variables: number of papers; number of citations; repeated collaborations; number of co-authors; number of co-authors from the same country (as the author), from the same city, from the same institution and from different countries; geographical dispersion (number of unique countries wherein co-authors are based in), star (the largest number of articles published by one of an author's collaborators), godfather (the largest number of citations received by one of an author's collaborators), co-authors' citations and co-authors' papers. Variables are yearly measured.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.
To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below:
Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset from the VIRTA Publication Information Service consists of the metadata of 241,575 publications of Finnish universities (publication years 2016–2021) merged from yearly datasets downloaded from https://wiki.eduuni.fi/display/cscvirtajtp/Vuositasoiset+Excel-tiedostot.
The dataset contains following information:
An XLSX Excel file which provides data used by Gregorio González-Alcaide & collaborators in the article “Dominance and leadership in research activities: collaboration between countries of differing human development is reflected through authorship order and designation as corresponding authors in scientific publications”. The worksheet “labels” describes every variable. The worksheets “Tropical Medicine-C1”, “Infectious Diseases-C1”, “Parasitology-C1” and “Pediatrics-C1” present the following data: WOS: Identifier for indexed document in the Web of Science used for the study. Web of Science identifiers do not change over time and are never reused. ORDER: Order of the authors who participated in the study. C1: Authors’ countries who participated in the study. ECO: Countries classification according to their Human Development Index. GEO: Countries classification according to a macro geographic (continental) region. The worksheets “Tropical Medicine-RP”, “Infectious Diseases-RP”, “Parasitology-RP” and “Pediatrics-RP” present the following data: WOS: Identifiers for indexed documents in the Web of Science used for the study. Web of Science identifiers do not change over time and are never reused. RP: Authors’ countries designated as corresponding authors. ECO: Countries classification according to their Human Development Index. GEO: Countries classification according to a macro geographic (continental) region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset with results of the poll conducted in the study “Information Scientists’ Motivations for Research Data Sharing and Reuse”.
In terms of the Uses and Gratifications Theory (Questions 1 and 2), the most popular uses relate to the categories of research support and information. Researchers share, or would share, their research data in general for any reusability purposes and especially for combination of different datasets to produce new evidence. Also, the vast majority of study participants associate research data sharing with possibilities to accelerate scientific progress and to increase research efficiency. In case of research data reuse, all the researchers indicated that they use, or would use, others’ data first of all for inspiration. Interestingly, study participants put relatively high the category of recognition in case of sharing, but at the same time they do not associate increased recognition among colleagues and other researchers with research data reuse. The remaining categories belonging to the categories of self-esteem and social interaction, i.e. increased citation level and visibility of the research as well as enhanced scientific reputation, possible cooperations and co-authorship, were selected only by few respondents. Also remarkably, data reuse is more frequently linked to entertainment then data sharing.
In terms of the Self-Determination Theory (Questions 3 and 4), all but one of the interviewees indicated that they have shared or would share their research data because it can accelerate scientific progress which they consider important and would like to contribute to it (i.e., identified regulation). The second most popular motivation turned out to be the obligation by employer, project funder and/or journals (i.e., external regulation). The third most popular option was social influence, i.e. because many other researchers participate in data sharing and they feel obligated to do the same (i.e., external regulation).This way, the participants demonstrate a mixture of identified motivation and external regulation, both material and social. In the case of data reuse, the participants demonstrate more homogeneous results with identification and intrinsic motivation having most of the votes. The role of external regulation seems to be much less important as in the case with data sharing. So, researchers reuse, or would reuse, research data because it can accelerate scientific progress which is important for them. Additionally, researchers enjoy exploring and using third party research data. Thus, interviewees participate or would participate in data sharing because they consider it important, but also feel or are obliged to do so. At the same time, study participants do not feel pressure from outside when deciding whether to reuse data or not.
For more information about the study and its results, please read the article “Information Scientists’ Motivations for Research Data Sharing and Reuse” by Shutsko and Stock (2023).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes data related to 4220 articles on sustainable mining published from 1983 to 2018. The Scopus database was selected as a data source. Detailed data applies to co-authored articles. The number of authors and affiliations (country, institution, sector) were taken into account. Data has been cleaned in terms of names of institutions and countries. In given sets the following data were included: - Distribution of articles in sustainable mining from 1983 to 2018 - Distribution of joint articles and the types of joint articles from 1983 to 2018 - Team size in terms of the number of authors of articles in sustainable mining from 1983 to 2018 - Team size in terms of the number of authors' institutions in articles in sustainable mining from 1983 to 2018 - Team size in terms of the number of authors' countries in articles in sustainable mining from 1983 to 2018
Please note, the field separator used in these files is a semicolon, while the decimal separator is a comma. Each of the files has two header lines.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Forecast: Share of Corresponding/Leading Author in Scientific Publications in Developmental Biology in the US 2024 - 2028 Discover more data with ReportLinker!
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Forecast: Share of Corresponding/Leading Author in Scientific Publications in Pediatrics in the US 2024 - 2028 Discover more data with ReportLinker!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data support the infographics found in our article, "Collaboration in Science, 2015," found on the AJE Author Resource Center at https://www.aje.com/en/arc/collaboration-2015. In particular, we analyzed PubMed data for 2015 to see which pairings of countries co-authored research publications, with specific focus on some established and emerging research centers: United States, China, Germany, Japan, South Korea, Brazil.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The top 10 most central authors in the co-authorship networks (Scopus data).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Forecast: Share of Corresponding/Leading Author in Scientific Publications in Chemical Engineering in the US 2024 - 2028 Discover more data with ReportLinker!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data description
This data note describes the final citation network dataset analysed in the manuscript "What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives’"[1].
The data collection strategy used to construct the following dataset can be found in the associated manuscript [1]. These data were originally downloaded from the Web of Science (WoS) Core Collection via the library subscription of the University of Edinburgh via a systematic search methodology that sought to capture literature relevant to ‘knowledge co-production’. The dataset consists of 1,893 unique document reference strings (nodes) interlinked together by 9,759 citation links (edges). The network dataset describes a directed citation network composed of papers relevant to 'knowledge co-production', and is split into two files: (i) ‘KnowCo_node_attribute_list.csv’ contains attributes of the 1,893 documents (nodes); and (ii) ‘KnowCo_edge_list.csv’ records the citation links (edges) between pairs of documents.
Id, the unique identifier. Fully retrieved documents are identified via a unique identifier that begins with ‘f’ followed by an integer (e.g. f1, f2, etc.). Non-retrieved documents are identified via a unique identifier beginning with ‘n’ followed by an integer (e.g. n1, n2, etc.).
Label, contains the unique reference string of the document for which the attribute data in that row corresponds. Reference strings contain the last name of the first author, publication year, journal, volume, start page, and DOI (if available).
authors, all author names. These are in the order that these names appear in the authorship list of the corresponding document. These data are only available for fully retrieved documents.
title, document title. These data are only available for fully retrieved documents.
journal, journal of publication. These data are only available for fully retrieved documents. For those interested in journal data for the remaining papers, this can be extracted from the reference string in the ‘Label’ column.
year, year of publication. These data are available for all nodes.
type, document type (e.g. article, review). Available only for fully retrieved documents.
wos_total_citations, total citation count as recorded by Web of Science Core Collection as of May 2020. Available only for fully retrieved documents.
wos_id, Web of Science accession number. Available only for fully retrieved documents only, for non-retrieved documents ‘CitedReference’ fills the cell.
cluster, provides the cluster membership number as discussed within the manuscript, established via modularity maximisation via the Leiden algorithm (Res 0.8; Q=0.53|5 clusters). Available for all nodes.
indegree, total count of within network citations to a given document. Due to the composition of the network, this figure tells us the total number of citations from 525 fully retrieved documents to each of the 1,893 documents within the network. Available for all nodes.
outdegree, total count of within network references from a given document. Due to the composition of the network, only fully retrieved documents can have a value >0 because only these documents have their associated reference list data. Available for all nodes.
Source, the citing document’s unique identifier.
Target, the cited document’s unique identifier.
Notes
[1] Bandola-Gill, J., Arthur, M., & Leng, R. I. (Under review). What is co-production? Conceptualising and understanding co-production of knowledge and policy across different theoretical perspectives. Evidence & Policy
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The primary aim of this research is to compare the quality of the data that is made publicly available associated with research outputs, with the quality of the data that comes by direct request. An additional factor in this research project is the age of the research output. For both sorts of data source (publicly shared vs requested), we will examine the quality, and the availability, as a function of publication time. Doing so will address claims in the literature for changes in data access over time (see Vines et al., https://doi.org/10.1016/j.cub.2013.11.014).
For this project, two publication sources will be considered: (1) British Journal of Psychology (2) Psychological Science
In each case, articles will be sampled randomly from the Journal across the publication window 2016-2020, so as to form 2 time windows (2016 - 2017 and 2018 - 2020).
Where articles are identified that indicate publicly shared data, these will be examined for functionality following Towse et al. (in press), https://doi.org/10.3758/s13428-020-01486-1
Where articles do not indicate publicly shared data, a sample of authors will be contact by email to request the underlying data. (i) All corresponding authors will be contacted at the same time. (ii) A single follow up / reminder will be sent 2 weeks after the initial message to re-request the data.
We will record (a) the proportion of authors who reply to the initial request for data and (b) the proportion of authors who reply the the follow-up request. In each case, we will note whether the underlying dataset is provided (rather than, for example, an explanation for why data are not being supplied). In cases where requested data has been provided by authors, it will be examined in the same way as publicly shared data for functionality (see above).
We will provide a copy of the template record to be sent to corresponding authors on the OSF project.
An additional objective of the project is to find out why researchers decide to share (or not share) their data. Once the first phase of the study is completed as outlined above, the authors of all the papers used in the research will be invited to complete a survey on the reasons why they believe the research data should (or not) be shared. The survey will be sent by email. (i) All corresponding authors will be contacted at the same time - everyone will receive a personal link to the survey. (ii) A single follow up / reminder will be sent 2 weeks after the initial message to invite the authors who still have not completed the survey. The aim of this part of the study is to study the reasons underlying researcher decisions on data sharing. For previous research, see Roche et al. (2014), doi:10.1371/journal.pbio.1001779 Respondents will complete a set of questions from a Qualtrics survey, based on a questionnaire by Houtkoop et al. (2018), https://doi.org/10.1177%2F2515245917751886 The survey will be anonymous, meaning the answers will not be identifiable.
This dataset consists of the Surface Ocean CO2 Atlas Version 2022 (SOCATv2022) data product files. The ocean absorbs one quarter of the global CO2 emissions from human activity. The community-led Surface Ocean CO2 Atlas (www.socat.info) is key for the quantification of ocean CO2 uptake and its variation, now and in the future. SOCAT version 2022 has quality-controlled in situ surface ocean fCO2 (fugacity of CO2) measurements on ships, moorings, autonomous and drifting surface platforms for the global oceans and coastal seas from 1957 to 2021. The main synthesis and gridded products contain 33.7 million fCO2 values with an estimated accuracy of better than 5 μatm. A further 6.4 million fCO2 sensor data with an estimated accuracy of 5 to 10 μatm are separately available. During quality control, marine scientists assign a flag to each data set, as well as WOCE flags of 2 (good), 3 (questionable) or 4 (bad) to individual fCO2 values. Data sets are assigned flags of A and B for an estimated accuracy of better than 2 μatm, flags of C and D for an accuracy of better than 5 μatm and a flag of E for an accuracy of better than 10 μatm. Bakker et al. (2016) describe the quality control criteria used in SOCAT versions 3 to 2022. Quality control comments for individual data sets can be accessed via the SOCAT Data Set Viewer (www.socat.info). All data sets, where data quality has been deemed acceptable, have been made public. The main SOCAT synthesis files and the gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Access to data sets with an estimated accuracy of 5 to 10 (flag of E) and fCO2 values with flags of 3 and 4 is via additional data products and the Data Set Viewer (Table 8 in Bakker et al., 2016). SOCAT publishes a global gridded product with a 1° longitude by 1° latitude resolution. A second product with a higher resolution of 0.25° longitude by 0.25° latitude is available for the coastal seas. The gridded products contain all data sets with an estimated accuracy of better than 5 µatm (data set flags of A to D) and fCO2 values with a WOCE flag of 2. Gridded products are available monthly, per year and per decade. Two powerful, interactive, online viewers, the Data Set Viewer and the Gridded Data Viewer (www.socat.info), enable investigation of the SOCAT synthesis and gridded data products. SOCAT data products can be downloaded. Matlab code is available for reading these files. Ocean Data View also provides access to the SOCAT data products (www.socat.info). SOCAT data products are discoverable, accessible and citable. The SOCAT Data Use Statement (www.socat.info) asks users to generously acknowledge the contribution of SOCAT scientists by invitation to co-authorship, especially for data providers in regional studies, and/or reference to relevant scientific articles. The SOCAT website (www.socat.info) provides a single access point for online viewers, downloadable data sets, the Data Use Statement, a list of contributors and an overview of scientific publications on and using SOCAT. Automation of data upload and initial data checks allows annual releases of SOCAT from version 4 onwards. SOCAT is used for quantification of ocean CO2 uptake and ocean acidification and for evaluation of climate models and sensor data. SOCAT products inform the annual Global Carbon Budget since 2013. The annual SOCAT releases by the SOCAT scientific community are a Voluntary Commitment for United Nations Sustainable Development Goal 14.3 (Reduce Ocean Acidification) (#OceanAction20464). More broadly the SOCAT releases contribute to UN SDG 13 (Climate Action) and SDG 14 (Life Below Water), and to the UN Decade of Ocean Science for Sustainable Development. Hundreds of peer-reviewed scientific publications and high-impact reports cite SOCAT. The SOCAT community-led synthesis product is a key step in the value chain based on in situ inorganic carbon measurements of the oceans, which provides policy makers with critical information on ocean CO2 uptake in climate negotiations. The need for accurate knowledge of global ocean CO2 uptake and its (future) variation makes sustained funding of in situ surface ocean CO2 observations imperative.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The position of an author on the byline of a paper affects the inferences readers make about their contributions to the research. We examine gender differences in authorship in the ecology literature using two datasets: submissions to six journals between 2010 and 2015 (regardless of whether they were accepted), and manuscripts published by 151 journals between 2009 and 2015. Women were less likely to be last (i.e., 'senior') authors (averaging ~23% across journals, years and datasets) and sole authors (~24%), but more likely to be first author (~38%), relative to their overall frequency of authorship (~31%). However, the proportion of women in all authorship roles, except sole authorship, has increased year-on-year. Women were less likely to be authors on papers with male last authors, and all-male papers were more abundant than expected given the overall gender ratio. Women were equally-well represented on papers published in higher versus lower impact factor journals at all authorship positions. Female first authors were less likely to serve as corresponding author of their papers; this difference increased with the degree of gender inequality in the author's home country, but did not depend on the gender of the last author. First authors from non-English speaking countries were less likely to serve as corresponding author of their papers, especially if the last author was from an English-speaking country. That women more often delegate corresponding authorship to one of their coauthors may increase the likelihood that readers undervalue their role in the research by shifting credit for their contributions to coauthors. We suggest that author contribution statements be more universally adopted and that these statements declare how and/or why the corresponding author was selected for this role.