Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version: 6
Date of data collection: May 2025 General description: Publication of datasets according to the FAIR principles could be reached publishing a data paper (and/or a software paper) in data journals as well as in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list: - data_articles_journal_list_v6.xlsx: full list of 177 academic journals in which data papers or/and software papers could be published - data_articles_journal_list_v6.csv: full list of 177 academic journals in which data papers or/and software papers could be published - readme_v6.txt, with a detailed descritption of the dataset and its variables. Relationship between files: both files have the same information. Two different formats are offered to improve reuse Type of version of the dataset: final processed version Versions of the files: 6th version - Information updated: number of journals (17 were added and 4 were deleted), URL, document types associated to a specific journal. - Information added: diamond journals were identified.
Version: 5
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2023/09/05
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v5.xlsx: full list of 162 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v5.csv: full list of 162 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 5th version
- Information updated: number of journals, URL, document types associated to a specific journal.
163 journals (excel y csv)
Version: 4
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/12/15
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 4th version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.
Version: 3
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/10/28
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 3rd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).
Erratum - Data articles in journals Version 3:
Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
Data -- ISSN 2306-5729 -- JCR (JIF) n/a
Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a
Version: 2
Author: Francisco Rubio, Universitat Politècnia de València.
Date of data collection: 2020/06/23
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)
Total size: 32 KB
Version 1: Description
This dataset contains a list of journals that publish data articles, code, software articles and database articles.
The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of annual Article Processing Charges (APCs) for 6,252 journals from 2015 to 2018. The dataset contains annual APCs for journals indexed in the Web of Science (WoS) and published by the oligopoly of academic publishers (Elsevier, Sage, Springer-Nature, Taylor & Francis, Wiley). It also includes an estimate of the total APCs paid by the academic community based on the number of gold and hybrid articles published between 2015 and 2018. The dataset was created using publication data from WoS, OA status from Unpaywall and annual APC prices from open datasets (Matthias, 2020; Morrison, 2021) and historical fees retrieved via the Internet Archive Wayback Machine.
Detailed methods and findings are reported in the following journal article
Butler, L.-A., Matthias, L., Simard, M.-A., Mongeon, P., & Haustein, S. (2023). The Oligopoly's Shift to Open Access. How the Big Five Academic Publishers Profit from Article Processing Charges. Quantitative Science Studies. Preprint: https://doi.org/10.5281/zenodo.8322555
Description of included files (v1):
APCs.csv: contains the annual APCs for gold and hybrid OA journals indexed in Web of Science published by the oligopoly of academic publishers (Elsevier, Sage, Springer-Nature, Taylor & Francis, Wiley) between 2015 and 2018 including the total estimate of APCs paid per journal per year. It contains APC data for 18,846 journal-year-OA status combinations.
countries.csv: contains the fractionalized number of annual gold and hybrid OA articles by oligopoly publishers between 2015 and 2018 and the total estimate of fractionalized APCs paid per country per journal per year.
oecd.csv: contains the fractionalized number of annual gold and hybrid OA articles by oligopoly publishers between 2015 and 2018 and the total estimate of fractionalized APCs per discipline per journal per year.
ReadMe.csv: contains a description of the variables used in APCs.csv, countries.csv and oecd.csv.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This document describes a dataset that aggregates information about 135 data journals. Data journals focus on the publication of data papers -- a specialized publication type describing datasets, their collection and reuse potential that is peer-reviewed, citable and indexed. This dataset includes a comprehensive list of data journals that was compiled by aggregating existing sources, as well as an overview of these sources.
The list is continually updated on GitHub, where additional information on data journals (URLs of data journal homepages) is provided: https://github.com/MaxiKi/data-journals
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains five sets of datasets: 1) Publication counts from two multidisciplinary humanities data journals: the Journal of Open Humanities Data and Research Data in the Humanities and Social Sciences (RDJ_JOHD_Publications.csv); 2) A large dataset about the performance of research articles in HSS exported from dimensions.ai (allhumss_dims_res_papers_PUB_ID.csv); 3) A large dataset about the performance of datasets in HSS harvested from the Zenodo REST API (Zenodo.zip); 4) Impact and usage metrics from the papers published in the two journals above (final_outputs.zip); 5) Data from Twitter analytics on tweets from the @up_johd account, with paper DOI and engagement rate (twitter-data.zip).
Please note that, as requested by the Dimensions team, for 2 and 4, we only included the Publication IDs from Dimensions rather than the full data. Interested parties only need the Dimensions publications IDs to retrieve the data; even if they have no Dimensions subscription, they can easily get a no-cost agreement with Dimensions, for research purposes, in order to retrieve the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For this dataset, scientific peer-reviewed articles by Tampere University researchers from the years 2020 and 2021 were extracted from the TUNICRIS. A random sample of 40 percent was taken from the listed 4,922 publications according to faculties and years. There were 2,085 analyzed articles, i.e. more than 42 percent of the total number.
To find Data Availability Statements, articles were opened one by one and searched for mentions of research data and its availability. For each article, it was written down whether DAS existed and where in the article it was located. From the contents of DAS, information about data availability, location, openness and possible restrictions on use was written down.
Dataset also includes information about the journals and publications taken from TUNICRIS.
The prevalence of DAS and data openness were examined in relation to different variables. Tampere University faculty information has been removed from the dataset.
Related slides: https://doi.org/10.5281/zenodo.7655892
Related article (in Finnish): Toikko, T., & Kylmälä, K. (2023). Tutkimusdatan saatavuustiedot tieteellisissä artikkeleissa: Raportti Data Availability Statementien käytöstä Tampereen yliopistossa. Informaatiotutkimus, 42(1-2), 31–50. https://doi.org/10.23978/inf.126098
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Answers to a survey on gold Open Access run from July to October 2016. The dataset contains 15,235 unique responses from Web of Science published authors. This survey is part of a PhD thesis from the University of Granada in Spain. More details about the study can be found in the full text document, also available in Zenodo.
Following are listed the questions related to the WoS 2016 dataset. Please note that countries with less than 40 answers are listed as "Other" in order to preserve anonymity.
Fewer than 5 years
5-14 years
15-24 years
25 years or longer
Many of the questions that follow concern Open Access publishing. For the purposes of this survey, an article is Open Access if its final, peer-reviewed, version is published online by a journal and is free of charge to all users without restrictions on access or use.
Yes
No
I do not know
Yes
No
I have no opinion
I do not care
1-5
6-10
11-20
21-50
More than 50
[Each factor may be rated "Extremely important", "Important", "Less important" or "Irrelevant". The factors are presented in random order.]
Importance of the journal for academic promotion, tenure or assessment
Recommendation of the journal by my colleagues
Positive experience with publisher/editor(s) of the journal
The journal is an Open Access journal
Relevance of the journal for my community
The journal fits the policy of my organisation
Prestige/perceived quality of the journal
Likelihood of article acceptance in the journal
Absence of journal publication fees (e.g. submission charges, page charges, colour charges)
Copyright policy of the journal
Journal Impact Factor
Speed of publication of the journal
The decision is my own
A collective decision is made with my fellow authors
I am advised where to publish by a senior colleague
The organisation that finances my research advises me where to publish
Other (please specify) [Text box follows]
0
1-5
6-10
More than 10
I do not know
[If the answer is "0", the survey jumps to Q10.]
No charge
Up to €250 ($275)
€251-€500 ($275-$550)
€501-€1000 ($551-$1100)
€1001-€3000 ($1101-$3300)
More than €3000 ($3300)
I do not know
[If the answer is "No charge or I don't know" the survey jumps to Q20. ]
My research funding includes money for paying such fees
I used part of my research funding not specifically intended for paying such fees
My institution paid the fees
I paid the costs myself
Other (please specify) [Text box follows]
Easy
Difficult
I have not used these sources
[Each statement may be rated "Strongly agree", "Agree", "Neither agree nor disagree", "Disagree" or "Strongly disagree". The statements are presented in random order.]
Researchers should retain the rights to their published work and allow it to be used by others
Open Access publishing undermines the system of peer review
Open Access publishing leads to an increase in the publication of poor quality research
If authors pay publication fees to make their articles Open Access, there will be less money available for research
It is not beneficial for the general public to have access to published scientific and medical articles
Open Access unfairly penalises research-intensive institutions with large publication output by making them pay high costs for publication
Publicly-funded research should be made available to be read and used without access barrier
Open Access publishing is more cost-effective than subscription-based publishing and so will benefit public investment in research
Articles that are available by Open Access are likely to be read and cited more often than those not Open Access
This study and its questionnaire are based on the SOAP Project (http://project-soap.eu). An article describing the highlights of the SOAP Survey is available at: https://arxiv.org/abs/1101.5260. The dataset of the SOAP survey is available at http://bit.ly/gSmm71. A manual describing the SOAP dataset is available at http://bit.ly/gI8nc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the report 'Scenario Modelling for Open Research Europe', prepared for the European Commission's Directorate-General for Research and Innovation by Rob Johnson of Research Consulting, acting in the capacity of an independent expert. The data was extracted from www.lens.org in May 2023 to asset historic patterns of publication growth for a sample of open access journals and platforms.
The Softcite dataset is a gold-standard dataset of software mentions in research publications, a free resource primarily for software entity recognition in scholarly text. This is the first release of this dataset.
What's in the dataset
With the aim of facilitating software entity recognition efforts at scale and eventually increased visibility of research software for the due credit of software contributions to scholarly research, a team of trained annotators from Howison Lab at the University of Texas at Austin annotated 4,093 software mentions in 4,971 open access research publications in biomedicine (from PubMed Central Open Access collection) and economics (from Unpaywall open access services). The annotated software mentions, along with their publisher, version, and access URL, if mentioned in the text, as well as those publications annotated as containing no software mentions, are all included in the released dataset as a TEI/XML corpus file.
For understanding the schema of the Softcite corpus, its design considerations, and provenance, please refer to our paper included in this release (preprint version).
Use scenarios
The release of the Softcite dataset is intended to encourage researchers and stakeholders to make research software more visible in science, especially to academic databases and systems of information retrieval; and facilitate interoperability and collaboration among similar and relevant efforts in software entity recognition and building utilities for software information retrieval. This dataset can also be useful for researchers investigating software use in academic research.
Current release content
softcite-dataset v1.0 release includes:
The Softcite dataset corpus file: softcite_corpus-full.tei.xml
Softcite Dataset: A Dataset of Software Mentions in Biomedical and Economic Research Publications, our paper that describes the design consideration and creation process of the dataset: Softcite_Dataset_Description_RC.pdf. (This is a preprint version of our forthcoming publication in the Journal of the Association for Information Science and Technology.)
The Softcite dataset is licensed under a Creative Commons Attribution 4.0 International License.
If you have questions, please start a discussion or issue in the howisonlab/softcite-dataset Github repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for publication: 'Measuring the excellence contribution at the journal level: An alternative to Garfield's Impact Factor'.
Overview. Overview of the number of journals, publications, excellent publications and multidisciplinarity for each category considered.
ALL. Journal indicators for all the document types by JCR category.
ALL_JCR. Journal indicators for all the document types by JCR category (only journals indexed in the JCR category are taken into account).
AR. Journal indicators for only articles and reviews by JCR category.
AR_JCR. Journal indicators for only articles and reviews by JCR category (only journals indexed in the JCR category are taken into account).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for Journal recommendation, includes title, abstract, keywords, and journal.
We extracted the journals and more information of:
Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.
Dataset Components:
data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.
data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.
data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains information about the published issues of newspapers digitised and made available through Trove. The data was harvested from the Trove API, using this notebook in the GLAM Workbench.
There are two data files:
newspaper_issues_totals_by_year.csv – the total number of newspaper issues per year for each digitised newspaper
newspaper_issues.csv – a complete list of newspaper issues available from Trove
newspaper_issues_totals_by_year.csv
The dataset contains the following columns:
Column Contents
title newspaper title
title_id newspaper id
state place of publication
year year published
issues number of issues
newspaper_issues.csv
The dataset contains the following columns:
Column Contents
title newspaper title
title_id newspaper id
state place of publication
issue_id issue identifier
issue_date date of publication (YYYY-MM-DD)
To keep the file size down, I haven't included an issue_url in this dataset, but these are easily generated from the issue_id. Just add the issue_id to the end of http://nla.gov.au/nla.news-issue. For example: http://nla.gov.au/nla.news-issue495426. Note that when you follow an issue url, you actually get redirected to the url of the first page in the issue.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel file contains the titles, DOIs, references, and EIDs of those Neuroscience publications (journal articles, books/book chapters, conference papers, notes, etc.) from 2004 to 2022 that have at least one reference to a preprint. For example, if a Neuroscience journal article has 40 references and one of these references is a preprint, then it's included in this Excel file. These records are retrieved from Scopus through the following query:
REFSRCTITLE ( "OSF Preprints" OR "open science foundation Preprints" OR africarxiv OR agrixiv OR arabixiv OR arxiv OR biohackrxiv OR biorxiv OR bodoarxiv OR cogprints OR eartharxiv OR ecoevorxiv OR ecsarxiv OR edarxiv OR engrxiv OR frenxiv OR "INA-Rxiv" OR indiarxiv OR lawarxiv OR "LIS Scholarship Archive" OR marxiv OR mediarxiv OR metaarxiv OR mindrxiv OR nutrixiv OR paleorxiv OR "Preprints.org" OR psyarxiv OR repec OR socarxiv OR sportrxiv OR "Thesis Commons" OR "CoP preprint" OR "FocUS Archive preprint" OR "PeerJ preprint" OR "Law Archive preprint" OR medrxiv ) AND SUBJAREA ( neur ) AND PUBYEAR < 2023
References of the publications are split through a Python code (SplitReferences.py) and organized into separate lines in a text file. For example, if a publication has 40 references, all of these 40 references are split into 40 separate lines. After splitting references, those lines containing one of these words/terms ("OSF Preprints" OR "open science foundation preprints" OR africarxiv OR agrixiv OR arabixiv OR arxiv OR biohackrxiv OR biorxiv OR bodoarxiv OR cogprints OR eartharxiv OR ecoevorxiv OR ecsarxiv OR edarxiv OR engrxiv OR frenxiv OR "INA-Rxiv" OR indiarxiv OR lawarxiv OR "LIS Scholarship Archive" OR marxiv OR mediarxiv OR metaarxiv OR mindrxiv OR nutrixiv OR paleorxiv OR "Preprints.org" OR psyarxiv OR repec OR socarxiv OR sportrxiv OR "Thesis Commons" OR "CoP preprint" OR "FocUS Archive preprint" OR "PeerJ preprint" OR "Law Archive preprint" OR medrxiv) are selected (through RetrieveLinesContainingSpeceficString.py) and organized into this text file (ReferencesToPreprints.V3.txt). Each reference contains an EID (separated by ";") in order to specify which publication contains this specific reference.
After this step, through a Python code (AddPreprintServerToEndOfLines.py) the name of a certain preprint was added to the end of each line. For example, if a line (or a reference) contains "biorxiv", the word "biorxiv" will be added to the end of this line after the "@" sign.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the data presented in our paper "Three-dimensional Doppler, polarization-gradient, and magneto-optical forces for atoms and molecules with dark states" which has been accepted for publication in the New Journal of Physics (as of 07 November 2016).
Attribution 1.0 (CC BY 1.0)https://creativecommons.org/licenses/by/1.0/
License information was derived automatically
The dataset contains all the data produced running the research software for the study:"Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta".
Disclaimer: these results are not considered to be representative, because we have fount that Mega Journals skewed significantly some of the data. The result datasets without Mega Journals are published here.
Description of datasets:
SSH_Publications_in_OC_Meta_and_Open_Access_status.csv: containing information about OpenCitations Meta coverage of ERIH PLUS Journals as well as their Open Access availability. In this dataset, every row holds data for a Journal of ERIH PLUS also covered by OpenCitations Meta database. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
SSH_Publications_by_Discipline.csv: containing information about number of publications per discipline (in addition, number of journals per discipline are also included). The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
SSH_Publications_and_Journals_by_Country: containing information about number of publications and journals per country. The dataset has three columns, the first, labeled "Country", contains single countries of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
result_disciplines.json: the dictionary containing all disciplines as key and a list of related ERIH PLUS venue identifiers as value.
result_countries.json: the dictionary containing all countries as key and a list of related ERIH PLUS venue identifiers as value.
duplicate_omids.csv: a dataset containing the duplicated Journal entries in OpenCitations Meta, structured with two columns: "OC_omid", the internal OC Meta identifier; "issn", the issn values associated to that identifier
eu_data.csv: contains the data specific for European countries' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "Original_Title", "Country_of_Publication","ERIH_PLUS_Disciplines", "disc_count", the number of disciplines per Journal.
eu_disciplines_count.csv: containing information about number of publications per discipline and number of journals per discipline of european countries. The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
meta_coverage_eu.csv: contains the data specific for European countries' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
us_data.csv: contains the data specific for the United States' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "Original_Title", "Country_of_Publication","ERIH_PLUS_Disciplines", "disc_count", the number of disciplines per Journal.
us_disciplines_count.csv: containing information about number of publications per discipline and number of journals per discipline of the United States. The dataset has three columns, the first, labeled "Discipline", contains single disciplines of the ERIH classificaton, the second and the third, labeled "Journal_count" and "Publication_count", respectively, the number of Journals and the number of Publications counted for each discipline.
meta_coverage_us.csv: contains the data specific for the United States' SSH Journals covered in OCMeta. It is structured with the following columns: "EP_id", the internal ERIH PLUS identifier; "Publications_in_venue", the numbers of Publications counted in each venue; "OC_omid", the internal OpenCitations Meta identifier for the venue; "issn", numbers of publications in each venue; "Open Access", a value to represent if the journal is OA or not, either "True" or "Unknown".
Abstract of the research:
Purpose: this study aims to investigate the representation and distribution of Social Science and Humanities (SSH) journals within the OpenCitations Meta database, with a particular emphasis on their Open Access (OA) status, as well as their spread across different disciplines and countries. The underlying premise is that open infrastructures play a pivotal role in promoting transparency, reproducibility, and trust in scientific research. Study Design and Methodology: the study is grounded on the premise that open infrastructures are crucial for ensuring transparency, reproducibility, and fostering trust in scientific research. The research methodology involved the use of secondary data sources, namely the OpenCitations Meta database, the ERIH PLUS bibliographic index, and the DOAJ index. A custom research software was developed in Python to facilitate the processing and analysis of the data. Findings: the results reveal that 78.1% of SSH journals listed in the European Reference Index for the Humanities (ERIH-PLUS) are included in the OpenCitations Meta database. The discipline of Psychology has the highest number of publications. The United States and the United Kingdom are the leading contributors in terms of the number of publications. However, the study also uncovers that only 38% of the SSH journals in the OpenCitations Meta database are OA. Originality: this research adds to the existing body of knowledge by providing insights into the representation of SSH in open bibliographic databases and the role of open access in this domain. The study highlights the necessity for advocating OA practices within SSH and the significance of open data for bibliometric studies. It further encourages additional research into the impact of OA on various facets of citation patterns and the factors leading to disparity across disciplinary representation.
Related resources:
Ghasempouri S., Ghiotto M., & Giacomini S. (2023). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - RESEARCH ARTICLE. https://doi.org/10.5281/zenodo.8263908
Ghasempouri, S., Ghiotto, M., Giacomini, S., (2023). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - DATA MANAGEMENT PLAN (Version 4). Zenodo. https://doi.org/10.5281/zenodo.8174644
Ghasempouri, S., Ghiotto, M., Giacomini, S. (2023e). Open Science for Social Sciences and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta - PROTOCOL. V.5. (https://dx.doi.org/10.17504/protocols.io.5jyl8jo1rg2w/v5)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Rings in Clinical Trials and Drugs: Present and Future" - Datasets from publication in Journal of Medicinal Chemistry
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has the set of Brazilian authors with publications in open access journals. It presents information about place of professional performance, maximum degree, large area and area of expertise.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides data on 152 scholarly journals that have been identified to have "reverse flipped", i.e. at some point changed their publishing model from open access publishing to toll-access (incl. hybrid open access). Appended is a brief documentation covering the contents of the various data points. A description of the data collection method and analysis are aimed to be made available through a journal article publication during 2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset to original publication:
Hertwig, Alexandra (2020): Higher Education Research. A Compilation of Journals and Abstracts 2019. Kassel: INCHER-Kassel. DOI: 10.17170/kobra-202010292027.
The Research Information Service (RIS) of INCHER-Kassel, Germany provides annual compilation of academic journals since 2013. This useful information tool for researchers also provides as a “side effect” an overview of the current topics of higher education research. The datasets allow for further evaluation of single or multiple volumes. For more information on original publications and available datasets please visit INCHER’s RIS websites.
http://www.uni-kassel.de/einrichtungen/en/incher/risspecial-research-library/ris-documents.html
This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas.
The main dataset (author_address_article_gend_v3.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row):
ut: unique article identifier.
daisng_id: unique author identifier.
author_no: author number, as listed in the article.
country: author country (two-letter ISO code).
date: publication date.
gender: gender of the author ("male" or "female"), as provided by the Genderize.io API.
probability: probability of the gender attribute, as provided by the Genderize.io API.
count: number of entries for the author first name, as provided by the Genderize.io API.
jsc: journal subject category.
field: field of research.
research_area: area of research.
n_aut: number of authors in this publication.
journal: journal name.
alphabetical: whether the author list for this article is in alphabetical order.
With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format:
null_field.rds: null homophily values per country, year and field of research.
null_field_comp.rds: null homophily values per year and field of research (only for complete authorships).
null_research.rds: null homophily values per year and area of research.
null_research_comp.rds: null homophily values per year and area of research (only for complete authorships).
All these datasets have the same structure:
country: country (two-letter ISO code).
year: year.
variable: either field or research area name.
m: average homophily.
s: homophily std. error.
Finally, some supplementary files used in the descriptive analysis and methods:
File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019.
File wos_category_to_field.csv is a mapping from WoS categories to more general fields.
File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Zenodo's published open access records and communities metadata, including entries marked by the Zenodo staff as spam and deleted.
The datasets are gzipped compressed JSON-lines files, where each line is a JSON object representation of a Zenodo record or community.
Records dataset
Filename: zenodo_open_metadata_{ date of export }.jsonl.gz
Each object contains the terms: part_of, thesis, description, doi, meeting, imprint, references, recid, alternate_identifiers, resource_type, journal, related_identifiers, title, subjects, notes, creators, communities, access_right, keywords, contributors, publication_date
which correspond to the fields with the same name available in Zenodo's record JSON Schema at https://zenodo.org/schemas/records/record-v1.0.0.json.
In addition, some terms have been altered:
Communities dataset
Filename: zenodo_community_metadata_{ date of export }.jsonl.gz
Each object contains the terms: id, title, description, curation_policy, page
which correspond to the fields with the same name available in Zenodo's community creation form.
Notes for all datasets
For each object the term spam contains a boolean value, determining whether a given record/community was marked as spam content by Zenodo staff.
Some values for the top-level terms, which were missing in the metadata may contain a null value.
A smaller uncompressed random sample of 200 JSON lines is also included for each dataset to test and get familiar with the format without having to download the entire dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version: 6
Date of data collection: May 2025 General description: Publication of datasets according to the FAIR principles could be reached publishing a data paper (and/or a software paper) in data journals as well as in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers. File list: - data_articles_journal_list_v6.xlsx: full list of 177 academic journals in which data papers or/and software papers could be published - data_articles_journal_list_v6.csv: full list of 177 academic journals in which data papers or/and software papers could be published - readme_v6.txt, with a detailed descritption of the dataset and its variables. Relationship between files: both files have the same information. Two different formats are offered to improve reuse Type of version of the dataset: final processed version Versions of the files: 6th version - Information updated: number of journals (17 were added and 4 were deleted), URL, document types associated to a specific journal. - Information added: diamond journals were identified.
Version: 5
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2023/09/05
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v5.xlsx: full list of 162 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v5.csv: full list of 162 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 5th version
- Information updated: number of journals, URL, document types associated to a specific journal.
163 journals (excel y csv)
Version: 4
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/12/15
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 4th version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.
Version: 3
Authors: Carlota Balsa-Sánchez, Vanesa Loureiro
Date of data collection: 2022/10/28
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 3rd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).
Erratum - Data articles in journals Version 3:
Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
Data -- ISSN 2306-5729 -- JCR (JIF) n/a
Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a
Version: 2
Author: Francisco Rubio, Universitat Politècnia de València.
Date of data collection: 2020/06/23
General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:
- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published
Relationship between files: both files have the same information. Two different formats are offered to improve reuse
Type of version of the dataset: final processed version
Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)
Total size: 32 KB
Version 1: Description
This dataset contains a list of journals that publish data articles, code, software articles and database articles.
The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.