Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version: 5
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2023/09/05
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 5th version - Information updated: number of journals, URL, document types associated to a specific journal.
Version: 4
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/12/15
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 4th version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.
Version: 3
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/10/28
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 3rd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).
Erratum - Data articles in journals Version 3:
Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2 Data -- ISSN 2306-5729 -- JCR (JIF) n/a Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a
Version: 2
Author: Francisco Rubio, Universitat Politècnia de València.
Date of data collection: 2020/06/23
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 2nd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)
Total size: 32 KB
Version 1: Description
This dataset contains a list of journals that publish data articles, code, software articles and database articles.
The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals. Acknowledgements: Xaquín Lores Torres for his invaluable help in preparing this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This publication contains several datasets that have been used in the paper "Crowdsourcing open citations with CROCI – An analysis of the current status of open citations, and a proposal" submitted to the 17th International Conference on Scientometrics and Bibliometrics (ISSI 2019), available at https://opencitations.wordpress.com/2019/02/07/crowdsourcing-open-citations-with-croci/.
Additional information about the analyses described in the paper, including the code and the data we have used to compute all the figures, is available as a Jupyter notebook at https://github.com/sosgang/pushing-open-citations-issi2019/blob/master/script/croci_nb.ipynb. The datasets contain the following information.
non_open.zip: it is a zipped (~5 GB unzipped) CSV file containing the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, dated October 2018. All the entity types retrieved from Crossref were aligned to one of following five categories: journal, book, proceedings, dataset, other. The open CC0 citation data we used came from the CSV dump of most recent release of COCI dated 12 November 2018. The number of closed citations was calculated by subtracting the number of open citations to each entity available within COCI from the value “is-referenced-by-count” available in the Crossref metadata for that particular cited entity, which reports all the DOI-to-DOI citation links that point to the cited entity from within the whole Crossref database (including those present in the Crossref ‘closed’ dataset).
The columns of the CSV file are the following ones:
doi: the DOI of the publication in Crossref;
type: the type of the publication as indicated in Crossref;
cited_by: the number of open citations received by the publication according to COCI;
non_open: the number of closed citations received by the publication according to Crossref + COCI.
croci_types.csv: it is a CSV file that contains the numbers of open citations and closed citations received by the entities in the Crossref dump used in our computation, as collected in the previous CSV file, alligned in five classes depening on the entity types retrieved from Crossref: journal (Crossref types: journal-article, journal-issue, journal-volume, journal), book (Crossref types: book, book-chapter, book-section, monograph, book track, book-part, book-set, reference-book, dissertation, book series, edited book), proceedings (Crossref types: proceedings-article, proceedings, proceedings-series), dataset (Crossref types: dataset), other (Crossref types: other, report, peer review, reference-entry, component, report-series, standard, posted-content, standard-series).
The columns of the CSV file are the following ones:
type: the type publication between "journal", "book", "proceedings", "dataset", "other";
label: the label assigned to the type for visualisation purposes;
coci_open_cit: the number of open citations received by the publication type according to COCI;
crossref_close_cit: the number of closed citations received by the publication according to Crossref + COCI.
publishers_cits.csv: it is a CSV file that contains the top twenty publishers that received the greatest number of open citations. The columns of the CSV file are the following ones:
publisher: the name of the publisher;
doi_prefix: the list of DOI prefixes used assigned by the publisher;
coci_open_cit: the number of open citations received by the publications of the publisher according to COCI;
crossref_close_cit: the number of closed citations received by the publications of the publishers according to Crossref + COCI;
total_cit: the total number of citations received by the publications of the publisher (= coci_open_cit + crossref_close_cit).
20publishers_cr.csv: it is a CSV file that contains the numbers of the contributions to open citations made by the twenty publishers introduced in the previous CSV file as of 24 January 2018, according to the data available through the Crossref API. The counts listed in this file refers to the number of publications for which each publisher has submitted metadata to Crossref that include the publication’s reference list. The categories 'closed', 'limited' and 'open' refer to publications for which the reference lists are not visible to anyone outside the Crossref Cited-by membership, are visible only to them and to Crossref Metadata Plus members, or are visible to all, respectively. In addition, the file also record the total number of publications for which the publisher has submitted metadata to Crossref, whether or not those metadata include the reference lists of those publications.
The columns of the CSV file are the following ones:
publisher: the name of the publisher;
open: the number of publications in Crossref with an 'open' visibility for their reference lists;
limited: the number of publications in Crossref with an 'limited' visibility for their reference lists;
closed: the number of publications in Crossref with an 'closed' visibility for their reference lists;
overall_deposited: the overall number of publications for which the publisher has submitted metadata to Crossref.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundAn understanding of the resources which engineering students use to write their academic papers provides information about student behaviour as well as the effectiveness of information literacy programs designed for engineering students. One of the most informative sources of information which can be used to determine the nature of the material that students use is the bibliography at the end of the students’ papers. While reference list analysis has been utilised in other disciplines, few studies have focussed on engineering students or used the results to improve the effectiveness of information literacy programs. Gadd, Baldwin and Norris (2010) found that civil engineering students undertaking a finalyear research project cited journal articles more than other types of material, followed by books and reports, with web sites ranked fourth. Several studies, however, have shown that in their first year at least, most students prefer to use Internet search engines (Ellis & Salisbury, 2004; Wilkes & Gurney, 2009).PURPOSEThe aim of this study was to find out exactly what resources undergraduate students studying civil engineering at La Trobe University were using, and in particular, the extent to which students were utilising the scholarly resources paid for by the library. A secondary purpose of the research was to ascertain whether information literacy sessions delivered to those students had any influence on the resources used, and to investigate ways in which the information literacy component of the unit can be improved to encourage students to make better use of the resources purchased by the Library to support their research.DESIGN/METHODThe study examined student bibliographies for three civil engineering group projects at the Bendigo Campus of La Trobe University over a two-year period, including two first-year units (CIV1EP – Engineering Practice) and one-second year unit (CIV2GR – Engineering Group Research). All units included a mandatory library session at the start of the project where student groups were required to meet with the relevant faculty librarian for guidance. In each case, the Faculty Librarian highlighted specific resources relevant to the topic, including books, e-books, video recordings, websites and internet documents. The students were also shown tips for searching the Library catalogue, Google Scholar, LibSearch (the LTU Library’s research and discovery tool) and ProQuest Central. Subject-specific databases for civil engineering and science were also referred to. After the final reports for each project had been submitted and assessed, the Faculty Librarian contacted the lecturer responsible for the unit, requesting copies of the student bibliographies for each group. References for each bibliography were then entered into EndNote. The Faculty Librarian grouped them according to various facets, including the name of the unit and the group within the unit; the material type of the item being referenced; and whether the item required a Library subscription to access it. A total of 58 references were collated for the 2010 CIV1EP unit; 237 references for the 2010 CIV2GR unit; and 225 references for the 2011 CIV1EP unit.INTERIM FINDINGSThe initial findings showed that student bibliographies for the three group projects were primarily made up of freely available internet resources which required no library subscription. For the 2010 CIV1EP unit, all 58 resources used were freely available on the Internet. For the 2011 CIV1EP unit, 28 of the 225 resources used (12.44%) required a Library subscription or purchase for access, while the second-year students (CIV2GR) used a greater variety of resources, with 71 of the 237 resources used (29.96%) requiring a Library subscription or purchase for access. The results suggest that the library sessions had little or no influence on the 2010 CIV1EP group, but the sessions may have assisted students in the 2011 CIV1EP and 2010 CIV2GR groups to find books, journal articles and conference papers, which were all represented in their bibliographiesFURTHER RESEARCHThe next step in the research is to investigate ways to increase the representation of scholarly references (found by resources other than Google) in student bibliographies. It is anticipated that such a change would lead to an overall improvement in the quality of the student papers. One way of achieving this would be to make it mandatory for students to include a specified number of journal articles, conference papers, or scholarly books in their bibliographies. It is also anticipated that embedding La Trobe University’s Inquiry/Research Quiz (IRQ) using a constructively aligned approach will further enhance the students’ research skills and increase their ability to find suitable scholarly material which relates to their topic. This has already been done successfully (Salisbury, Yager, & Kirkman, 2012)CONCLUSIONS & CHALLENGESThe study shows that most students rely heavily on the free Internet for information. Students don’t naturally use Library databases or scholarly resources such as Google Scholar to find information, without encouragement from their teachers, tutors and/or librarians. It is acknowledged that the use of scholarly resources doesn’t automatically lead to a high quality paper. Resources must be used appropriately and students also need to have the skills to identify and synthesise key findings in the existing literature and relate these to their own paper. Ideally, students should be able to see the benefit of using scholarly resources in their papers, and continue to seek these out even when it’s not a specific assessment requirement, though it can’t be assumed that this will be the outcome.REFERENCESEllis, J., & Salisbury, F. (2004). Information literacy milestones: building upon the prior knowledge of first-year students. Australian Library Journal, 53(4), 383-396.Gadd, E., Baldwin, A., & Norris, M. (2010). The citation behaviour of civil engineering students. Journal of Information Literacy, 4(2), 37-49.Salisbury, F., Yager, Z., & Kirkman, L. (2012). Embedding Inquiry/Research: Moving from a minimalist model to constructive alignment. Paper presented at the 15th International First Year in Higher Education Conference, Brisbane. Retrieved from http://www.fyhe.com.au/past_papers/papers12/Papers/11A.pdfWilkes, J., & Gurney, L. J. (2009). Perceptions and applications of information literacy by first year applied science students. Australian Academic & Research Libraries, 40(3), 159-171.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the data for papers [1,2]. The 1st excel sheet ("theory") has data for Figures 1, 2 of [1]. The 2nd sheet ("JCR data") has data for Figures 3, 4, 5, 6 of [1], and Figure 1 of [2].A. Data in the "theory" sheet:2nd row: Citation count, c, of a single paper published by a journal of Impact Factor f1=10 and biennial size N1. We have chosen c to range from 0 to 1000 in our data. 2nd column: biennial size N1 of journal. We have chosen N1 to range from 10 to 2500 in our data. The data in the cells from C3 to EC283 in the sheet are calculations of the volatility, Δf(c), as defined in Eq. (4) of [1].B. Data in the "JCR data" sheet:The publication and citation data below are from each journal's individual Journal Citation Report for 2017. Impact Factor. These are data from the 2017 Journal Citation Reports (JCR).Journal biennial size, N2Y. This is the number of articles & reviews published in 2015-2016 by each journal.Citation average, f. This is the average number of citations received in 2017 by the articles & reviews published in 2015-2016.Volatility, Δf(c*): This is defined as f - f* (see below for f*)Relative volatility, Δfr(c*): This is defined as (f - f*)/f* (see below for f*)Top-cited paper, c*: This is the citation count of the top-cited paper in each journal, in the year 2017. Citation average excluding top-cited paper, f*: This is the average number of citations received in 2017 by the articles & reviews published in 2015-2016, once we exclude the top-cited paper (article or review). AcknowledgmentThis work uses data, accessed through Columbia University, from the Web of Science and Journal Citation Reports (2017) with explicit consent from Clarivate Analytics.References [1] M. Antonoyiannakis, Impact Factor volatility to a single paper: A comprehensive analysis, Quantitative Science Studies (2020, accepted), https://arxiv.org/abs/1911.02533[2] M. Antonoyiannakis, How a single paper affects the Impact Factor: Implications for Scholarly Publishing, Proceedings of the 17th Conference of the International Society on Scientometrics & Informetrics, vol. II, 2306-2313 (2019), https://arxiv.org/abs/1906.02660
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The vast majority of scientific articles published to-date have not been accompanied by concomitant publication of the underlying research data upon which they are based. This state of affairs precludes the routine re-use and re-analysis of research data, undermining the efficiency of the scientific enterprise, and compromising the credibility of claims that cannot be independently verified. It may be especially important to make data available for the most influential studies that have provided a foundation for subsequent research and theory development. Therefore, we launched an initiative—the Data Ark—to examine whether we could retrospectively enhance the preservation and accessibility of important scientific data. Here we report the outcome of our efforts to retrieve, preserve, and liberate data from 111 of the most highly-cited articles published in psychology and psychiatry between 2006–2011 (n = 48) and 2014–2016 (n = 63). Most data sets were not made available (76/111, 68%, 95% CI [60, 77]), some were only made available with restrictions (20/111, 18%, 95% CI [10, 27]), and few were made available in a completely unrestricted form (15/111, 14%, 95% CI [5, 22]). Where extant data sharing systems were in place, they usually (17/22, 77%, 95% CI [54, 91]) did not allow unrestricted access. Authors reported several barriers to data sharing, including issues related to data ownership and ethical concerns. The Data Ark initiative could help preserve and liberate important scientific data, surface barriers to data sharing, and advance community discussions on data stewardship.
Journals list of the top 10 publishers in JCR 2014 to 2018
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the field of social sciences and particularly in economics, studies have frequently reported a lack of reproducibility of published research. Most often, this is due to the unavailability of data reproducing the findings of a study. However, over the past years, debates on open science practices and reproducible research have become stronger and louder among research funders, learned societies, and research organisations. Many of these have started to implement data policies to overcome these shortcomings. Against this background, the article asks if there have been changes in the way economics journals handle data and other materials that are crucial to reproduce the findings of empirical articles. For this purpose, all journals listed in the Clarivate Analytics Journal Citation Reports edition for economics have been evaluated for policies on the disclosure of research data. The article describes the characteristics of these data policies and explicates their requirements. Moreover, it compares the current findings with the situation some years ago. The results show significant changes in the way journals handle data in the publication process. Research libraries can use the findings of this study for their advisory activities to best support researchers in submitting and providing data as required by journals.
The article has been published in LIBER QUARTERLY. The data should also be published in LQ's Dataverse.
The correct citation of the article is Vlaeminck, S. (2021). Dawning of a new age? Economics journals’ data policies on the test bench. LIBER Quarterly: The Journal of the Association of European Research Libraries, 31(1), 1–29. https://doi.org/10.53377/lq.10940
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data from 9,549 complete sets of annual financial statements are combined with the data from the corresponding audit reports, forming an unbalanced panel data set. The client companies included in the sample represent a supermajority of medium and large-sized companies registered in the Republic of Serbia. Information on the name of the auditing firm, the type of auditor, the date of audit, and the type of audit opinion is hand-collected from the audit reports issued by 77 audit firms (the Big 4 plus 73 other auditing firms), which, again, represents a supermajority of all the auditing firms registered in this country. In the total sample of audit opinions (6,343), the following frequencies of the four main types of audit opinions are observed: an adverse opinion (50), a disclaimer of opinion (344), a qualified opinion (1,278), and an unqualified opinion (4,671). Additionally, most common financial indicators are calculated based on the collected financial statements. Feel free to use it for research purposes or to reproduce the results presented in the article. For a detailed description of the variables and their descriptive statistics, please read the article: Empirical Data on Financial and Audit Reports of Serbian Business Entities. Proceedings of the 7th International Scientific Conference - FINIZ 2020, 193–198. https://doi.org/10.15308/finiz-2020-193-198 When referring to the data set in publications, please cite the article. These data are used in a research study and may not be redistributed or used for commercial purposes. If you have any questions, please feel free to contact me at nstanisic@singidunum.ac.rs
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset belonging to the report: Publication cultures and Dutch research output: a quantitative assessment
On the report:
Research into publication cultures commissioned by VSNU and carried out by Utrecht University Library has detailed university output beyond just journal articles, as well as the possibilities to assess open access levels of these other output types. For all four main fields reported on, the use of publication types other than journal articles is indeed substantial. For Social Sciences and Arts & Humanities in particular (with over 40% and over 60% of output respectively not being regular journal articles) looking at journal articles only ignores a significant share of their contribution to research and society. This is not only about books and book chapters, either: book reviews, conference papers, reports, case notes (in law) and all kinds of web publications are also significant parts of university output.
Analyzing all these publication forms and especially determining to what extent they are open access is currently not easy. Even combining some the largest citation databases (Web of Science, Scopus and Dimensions) leaves out a lot of non-article content and in some fields even journal articles are only partly covered. Lacking metadata like affiliations and DOIs (either in the original documents or in the scholarly search engines) makes it even harder to analyze open access levels by institution and field. Using repository-harvesting databases like BASE and NARCIS in addition to the main citation databases improves understanding of open access of non-article output, but these routes also have limitations. The report has recommendations for stakeholders, mostly to improve metadata and coverage and apply persistent identifiers.
Data Description Managed turfgrass is a common component of urban landscapes that is expanding under current land use trends. Previous studies have reported high rates of soil carbon sequestration in turfgrass, but no systematic review has summarized these rates nor evaluated how they change as turfgrass ages. We conducted a meta-analysis of soil carbon sequestration rates from 63 studies. Those data, as well as the code used to analyze them and create figures, are shared here. Dataset Development We conducted a systematic review from Nov 2020 to Jan 2021 using Google Scholar, Web of Science, and the Michigan Turfgrass Information File Database. The search terms targeted were "soil carbon", "carbon sequestration", "carbon storage", or “carbon stock”, with "turf", "turfgrass", "lawn", "urban ecosystem", or "residential", “Fescue”, “Zoysia”, “Poa”, “Cynodon”, “Bouteloua”, “Lolium”, or “Agrostis”. We included only peer-reviewed studies written in English that measured SOC change over one year or longer, and where grass was managed as turf (mowed or clipped regularly). We included studies that sampled to any soil depth, and included several methodologies: small-plot research conducted over a few years (22 datasets from 4 articles), chronosequences of golf courses or residential lawns (39 datasets from 16 articles), and one study that was a variation on a chronosequence method and compiled long-term soil test data provided by golf courses of various ages (3 datasets from Qian & Follett, 2002). In total, 63 datasets from 21 articles met the search criteria. We excluded 1) duplicate reports of the same data, 2) small plot studies that did not report baseline SOC stocks, and 3) pure modeling studies. We included five papers that only measured changes in SOC concentrations, but not areal stocks (i.e., SOC in Mg ha-1). For these papers, we converted from concentrations to stocks using several approaches. For two papers (Law & Patton, 2017; Y. Qian & Follett, 2002) we used estimated bulk densities provided by the authors. For the chronosequences reported in Selhorst & Lal (2011), we used the average bulk density reported by the author. For the 13 choronosequences reported in Selhorst & Lal (2013), we estimated bulk density from the average relationship between percent C and bulk density reported by Selhorst (2011). For Wang et al. (2014), we used bulk density values from official soil survey descriptions. Data provenance In most cases we contacted authors of the studies to obtain the original data. If authors did not reply after two inquiries, or no longer had access to the data, we captured data from published figures using WebPlotDigitizer (Rohatgi, 2021). For three manuscripts the data was already available, or partially available, in public data repositories. Data provenance information is provided in the document "Dataset summaries and citations.docx". Recommended Uses We recommend the following to data users: Consult and cite the original manuscripts for each dataset, which often provide additional information about turfgrass management, experimental methods, and environmental context. Original citations are provided in the document "Dataset summaries and citations.docx". For datasets that were previously published in public repositories, consult and cite the original datasets, which may provide additional data on turfgrass management practices, soil nitrogen, and natural reference sites. Links to repositories are in the document "Dataset summaries and citations.docx". Consider contacting the dataset authors to notify them of your plans to use the data, and to offer co-authorship as appropriate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This note reports on a successful attempt to reproduce the estimation results reported in McCabe and Snyder (2015). The authors investigate the link between the online availability of articles published in top-journals in economics and the number of citations these articles generate.
Repository to host data and tools associated with articles published by GigaScience & GigaByte journals. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study. Through their association with DataCite, each dataset will be assigned a DOI that can be used as a standard citation for future use of these data in other articles by the authors and other researchers. Datasets in GigaDB all require a title that is specific to the dataset, an author list, and an abstract that provides information specific to the data included within the dataset. Detailed information about the dataset is curated by dedicated biocurators in collaboration with the article authors at the time of publication of the associated manuscript to ensure full transparency and reproducibility of all journal articles published in GigaScience and GigaByte journals.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains impact metrics and indicators for a set of publications that are related to the COVID-19 infectious disease and the coronavirus that causes it. It is based on:
Τhe CORD-19 dataset released by the team of Semantic Scholar1 and
Τhe curated data provided by the LitCovid hub2.
These data have been cleaned and integrated with data from COVID-19-TweetIDs and from other sources (e.g., PMC). The result was dataset of 628,506 unique articles along with relevant metadata (e.g., the underlying citation network). We utilized this dataset to produce, for each article, the values of the following impact measures:
Influence: Citation-based measure reflecting the total impact of an article. This is based on the PageRank3 network analysis method. In the context of citation networks, it estimates the importance of each article based on its centrality in the whole network. This measure was calculated using the PaperRanking (https://github.com/diwis/PaperRanking) library4.
Influence_alt: Citation-based measure reflecting the total impact of an article. This is the Citation Count of each article, calculated based on the citation network between the articles contained in the BIP4COVID19 dataset.
Popularity: Citation-based measure reflecting the current impact of an article. This is based on the AttRank5 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). AttRank alleviates this problem incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current "hype" of an article.
Popularity alternative: An alternative citation-based measure reflecting the current impact of an article (this was the basic popularity measured provided by BIP4COVID19 until version 26). This is based on the RAM6 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as "time-awareness". This is why it is more suitable to capture the current "hype" of an article. This measure was calculated using the PaperRanking (https://github.com/diwis/PaperRanking) library4.
Social Media Attention: The number of tweets related to this article. Relevant data were collected from the COVID-19-TweetIDs dataset. In this version, tweets between 23/6/22-29/6/22 have been considered from the previous dataset.
We provide five CSV files, all containing the same information, however each having its entries ordered by a different impact measure. All CSV files are tab separated and have the same columns (PubMed_id, PMC_id, DOI, influence_score, popularity_alt_score, popularity score, influence_alt score, tweets count).
The work is based on the following publications:
COVID-19 Open Research Dataset (CORD-19). 2020. Version 2023-01-10 Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed 2023-01-10. doi:10.5281/zenodo.3715506
Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193 (version 2023-01-10)
R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
A Web user interface that uses these data to facilitate the COVID-19 literature exploration, can be found here. More details in our peer-reviewed publication here (also here there is an outdated preprint version).
Funding: We acknowledge support of this work by the project "Moving from Big Data Management to Data Science" (MIS 5002437/3) which is implemented under the Action "Reinforcement of the Research and Innovation Infrastructure", funded by the Operational Programme "Competitiveness, Entrepreneurship and Innovation" (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).
Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.
SciTech Connect is a portal to free, publicly-available DOE-sponsored R&D results including technical reports, bibliographic citations, journal articles, conference papers, books, multimedia and data information. Also provided is a data service for bibliographic records for historical and current research (1948-present) and further provides a data service for bibliographic records in MARC format for use in library catalogs.
An official Digital Object Identifier (DOI) Registration Agency of the International DOI Foundation launched as a cooperative effort among publishers to enable persistent cross-publisher citation linking in online academic journals. The citation-linking network today covers over 65 million journal articles and other content items (books chapters, data, theses, technical reports) from thousands of scholarly and professional publishers around the globe. CrossRef does not aggregate full-text content but rather, it uses a system of distributed aggregation whereby full-text content is linked through a database consisting of minimal publisher metadata. Each record in the database is essentially a triplet: (metadata + URL+DOI). In addition to assigning DOIs to scholarly content, CrossRef has additional services: * Cited-By Linking * CrossRef Metadata Services * CrossCheck plagiarism screening (powered by iThenticate) * CrossMark update identification service * FundRef Funder identification service
This bibliography includes 4715 citations of foreign and domestic research reports, journal articles, patents, conference proceedings, and books. These citations, those printed in the two previous publications of Oil Shales and Tar Sands: A Bibliography, and those being added to the Energy Data base on a continuing basis can be recalled and searched using the online computer retrieval system.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The average value of ‘Spearman Correlation between JIF and article citedness’ for individual authors in different groups based on according article citedness [Data from Clarivate Analytics’ National Citation Report for Norway].
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The presented cross-sectional dataset can be employed to analyze the governmental, trade, and competitiveness relationships of official COVID-19 reports. It contains 18 COVID-19 variables generated based on the official reports of 138 countries, as well as an additional 2203 governance, trade, and competitiveness indicators from the World Bank Group GovData360 and TCdata360 platforms in a preprocessed form. The current version was compiled on May 25, 2020.
Please cite as: • (Data in Brief article)
Data generation: • Data generation (data_generation. Rmd): Datasets were generated with this R Notebook. It can be used to update datasets and customize the data generation process.
Datasets: • Country data (country_data.txt): country data. • Metadata (metadata.txt): the metadata of selected GovData360 and TCdata360 indicators. • Joint dataset (joint_dataset.txt): the joint dataset of COVID-19 variables and preprocessed GovData360 and TCdata360 indicators. • Correlation matrix (correlation_matrix.txt): the Kendall rank correlation matrix of the joint dataset.
Raw data of figures and tables: • Raw data of Fig. 2 (raw_data_fig2.txt): the raw data of Fig. 2. • Raw data of Fig. 3 (raw_data_fig3.txt): the raw data of Fig. 3. • Raw data of Table 1 (raw_data_table1.txt): the raw data of Table 1. • Raw data of Table 2 (raw_data_table2.txt): the raw data of Table 2. • Raw data of Table 3 (raw_data_table3.txt): the raw data of Table 3.
This compilation includes 1067 citations of foreign and domestic reports, journal articles, patents, conference papers and proceedings, monographs, and books on fuel cells. The citations were taken from the DOE Energy Data Base covering the period June 1977 through June 1980. The citations are arranged in subject categories. Within each category the report citations are arranged alphanumerically by report number, and nonreport literature citations are arranged in inverse chronological order. Corporate Author, Personal Author, Subject, Contract Number, and Report Number indexes are provided. (WHK)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains impact measures (metrics/indicators) for 106,788,227 scientific articles. In particular, for each article we have calculated the following measures:
Citation count: This is the total number of citations, reflecting the "influence" (i.e., the total impact) of an article.
Incubation Citation Count (3-year CC): It is essentially a time-restricted version of the citation count, where the time window is distinct for each paper, i.e., only citations 3 years after its publication are counted. This measure can be seen as an indicator of a paper's "impulse", i.e., its initial momentum directly after its publication.
PageRank score: This is a citation-based measure reflecting the "influence" (i.e., the total impact) of an article. It is based on the PageRank1 network analysis method. In the context of citation networks, PageRank estimates the importance of each article based on its centrality in the whole network.
RAM score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the RAM2 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as “time-awareness”. This is why it is more suitable to capture the current “hype” of an article.
AttRank score: This is a citation-based measure reflecting the "popularity" (i.e., the current impact) of an article. It is based on the AttRank3 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). AttRank alleviates this problem incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher’s preference to read papers which received a lot of attention recently. This is why it is more suitable to capture the current “hype” of an article.
We provide five compressed CSV files (one for each measure/score provided) having lines of the form “DOI \t score”. The configuration of each measure has been captured in the corresponding filename. Regarding the different measures/scores, you can find more intuition inside a previous extensive experimental study4.
The data of the citation network used to produce this dataset have been gathered from (a) the OpenCitations’ COCI dataset (Sep-2020 version), (b) a MAG5,6 snapshot from Aug-2020, and (c) a Crossref snapshot from Mar-2020. The union of all distinct DOI-to-DOI citations that could be found in these sources have been considered (entries without a DOI were omitted).
The work is based on the following publications:
R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020)
I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access)
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839
K. Wang et al., “A Review of Microsoft Academic Services for Science of Science Studies”, Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045
A Web user interface that uses these data to facilitate literature exploration, can be found here. Moreover, the exact same scores can be gathered through BIP! Finder’s API.
Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version: 5
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2023/09/05
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 5th version - Information updated: number of journals, URL, document types associated to a specific journal.
Version: 4
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/12/15
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 4th version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.
Version: 3
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/10/28
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 3rd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).
Erratum - Data articles in journals Version 3:
Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2 Data -- ISSN 2306-5729 -- JCR (JIF) n/a Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a
Version: 2
Author: Francisco Rubio, Universitat Politècnia de València.
Date of data collection: 2020/06/23
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list:
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 2nd version - Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types - Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)
Total size: 32 KB
Version 1: Description
This dataset contains a list of journals that publish data articles, code, software articles and database articles.
The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals. Acknowledgements: Xaquín Lores Torres for his invaluable help in preparing this dataset.