17 datasets found

Number of total publications and percentage of open access publications for...
figshare.com
txt
Updated Jan 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isabel Basson; Marc-André Simard; Vincent Larivière (2022). Number of total publications and percentage of open access publications for Dimensions and WoS, by country, 2015-2019 [Dataset]. http://doi.org/10.6084/m9.figshare.18319238.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.18319238.v1
Dataset updated
Jan 31, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Isabel Basson; Marc-André Simard; Vincent Larivière
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the underlying dataset used for the country analysis regarding the percentage of papers in Dimensions and Web of Science (WoS), published between 2015 and 2019 that are open access (OA), regardless of mode of OA.A paper was assigned a country affiliation based on the affiliation of the first author of a paper, thus each paper is only counted once, regardless whether the paper had multiple coauthors.Each row represents the data for a country. A country only appears once (i.e., each row is unique).Column headings:iso_alpha_2 = the ISO alpha 2 country code of the countrycountry = the name of the country as stated either in Dimensions or WoS.world_bank_region_2021 = pub_wos = total number of papers (document type articles and reviews) indexed in WoS, published from 2015 to 2019oa_pers_wos = Percentage of pub_wos that are OApub_dim = total number of papers (document type journal articles) indexed in Dimensions, published from 2015 to 2019oa_pers_dim = Percentage of pub_dim that are OArelative_diff = the relative difference between oa_pers_dim and oa_pers_wos using the following equation: ((x-y))/((x+y) ), with x representing the percentage of papers for the country in the Dimensions dataset that are OA, and y representing the percentage of papers for the country in the WoS dataset that are OA. In cases of "N/A" in a cell, a division by 0 occurred.Data availabilityRestriction apply to both datasets used to generate the aggregate data. The Web of Science data is owned by Clarivate Analytics. To obtain the bibliometric data in the same manner as authors (i.e. by purchasing them), readers can contact Clarivate Analytics at the following URL: https://clarivate.com/webofsciencegroup/solutions/web-of-science/contact-us/. The Dimensions data is owned by Digital Science, which has a programme that provides no cost access to its data. It can be accessed at: https://dimensions.ai/data_access.
f
Public Availability of Published Research Data in High-Impact Journals
plos.figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alawi A. Alsheikh-Ali; Waqas Qureshi; Mouaz H. Al-Mallah; John P. A. Ioannidis (2023). Public Availability of Published Research Data in High-Impact Journals [Dataset]. http://doi.org/10.1371/journal.pone.0024357
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0024357
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Alawi A. Alsheikh-Ali; Waqas Qureshi; Mouaz H. Al-Mallah; John P. A. Ioannidis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundThere is increasing interest to make primary data from published research publicly available. We aimed to assess the current status of making research data available in highly-cited journals across the scientific literature. Methods and ResultsWe reviewed the first 10 original research papers of 2009 published in the 50 original research journals with the highest impact factor. For each journal we documented the policies related to public availability and sharing of data. Of the 50 journals, 44 (88%) had a statement in their instructions to authors related to public availability and sharing of data. However, there was wide variation in journal requirements, ranging from requiring the sharing of all primary data related to the research to just including a statement in the published manuscript that data can be available on request. Of the 500 assessed papers, 149 (30%) were not subject to any data availability policy. Of the remaining 351 papers that were covered by some data availability policy, 208 papers (59%) did not fully adhere to the data availability instructions of the journals they were published in, most commonly (73%) by not publicly depositing microarray data. The other 143 papers that adhered to the data availability instructions did so by publicly depositing only the specific data type as required, making a statement of willingness to share, or actually sharing all the primary data. Overall, only 47 papers (9%) deposited full primary raw data online. None of the 149 papers not subject to data availability policies made their full primary data publicly available. ConclusionA substantial proportion of original research papers published in high-impact journals are either not subject to any data availability policies, or do not adhere to the data availability instructions in their respective journals. This empiric evaluation highlights opportunities for improvement.
Z
Data Availability Statements in the 2020 and 2021 scientific publications of...
data.niaid.nih.gov
zenodo.org
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kylmälä, Kaisa (2024). Data Availability Statements in the 2020 and 2021 scientific publications of Tampere University [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7564440
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Kylmälä, Kaisa
Toikko, Tomi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tampere
Description
For this dataset, scientific peer-reviewed articles by Tampere University researchers from the years 2020 and 2021 were extracted from the TUNICRIS. A random sample of 40 percent was taken from the listed 4,922 publications according to faculties and years. There were 2,085 analyzed articles, i.e. more than 42 percent of the total number.

To find Data Availability Statements, articles were opened one by one and searched for mentions of research data and its availability. For each article, it was written down whether DAS existed and where in the article it was located. From the contents of DAS, information about data availability, location, openness and possible restrictions on use was written down.

Dataset also includes information about the journals and publications taken from TUNICRIS.

The prevalence of DAS and data openness were examined in relation to different variables. Tampere University faculty information has been removed from the dataset.

Related slides: https://doi.org/10.5281/zenodo.7655892

Related article (in Finnish): Toikko, T., & Kylmälä, K. (2023). Tutkimusdatan saatavuustiedot tieteellisissä artikkeleissa: Raportti Data Availability Statementien käytöstä Tampereen yliopistossa. Informaatiotutkimus, 42(1-2), 31–50. https://doi.org/10.23978/inf.126098
PLOS Open Science Indicators
plos.figshare.com
zip
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public Library of Science (2025). PLOS Open Science Indicators [Dataset]. http://doi.org/10.6084/m9.figshare.21687686.v10
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21687686.v10
Dataset updated
Jul 10, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Public Library of Science
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains article metadata and information about Open Science Indicators for approximately 139,000 research articles published in PLOS journals from 1 January 2018 to 30 March 2025 and a set of approximately 28,000 comparator articles published in non-PLOS journals. This is the tenth release of this dataset, which will be updated with new versions on an annual basis.This version of the Open Science Indicators dataset shares the indicators seen in the previous versions as well as fully operationalised protocols and study registration indicators, which were previously only shared in preliminary forms. The v10 dataset focuses on detection of five Open Science practices by analysing the XML of published research articles:Sharing of research data, in particular data shared in data repositoriesSharing of codePosting of preprintsSharing of protocolsSharing of study registrationsThe dataset provides data and code generation and sharing rates, the location of shared data and code (whether in Supporting Information or in an online repository). It also provides preprint, protocol and study registration sharing rates as well as details of the shared output, such as publication date, URL/DOI/Registration Identifier and platform used. Additional data fields are also provided for each article analysed. This release has been run using an updated preprint detection method (see OSI-Methods-Statement_v10_Jul25.pdf for details). Further information on the methods used to collect and analyse the data can be found in Documentation.Further information on the principles and requirements for developing Open Science Indicators is available in https://doi.org/10.6084/m9.figshare.21640889.Data folders/filesData Files folderThis folder contains the main OSI dataset files PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv, which containdescriptive metadata, e.g. article title, publication data, author countries, is taken from the article .xml filesadditional information around the Open Science Indicators derived algorithmicallyand the OSI-Summary-statistics_v10_Jul25.xlsx file contains the summary data for both PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv.Documentation folderThis file contains documentation related to the main data files. The file OSI-Methods-Statement_v10_Jul25.pdf describes the methods underlying the data collection and analysis. OSI-Column-Descriptions_v10_Jul25.pdf describes the fields used in PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv. OSI-Repository-List_v1_Dec22.xlsx lists the repositories and their characteristics used to identify specific repositories in the PLOS-Dataset_v10_Jul25.csv and Comparator-Dataset_v10_Jul25.csv repository fields.The folder also contains documentation originally shared alongside the preliminary versions of the protocols and study registration indicators in order to give fuller details of their detection methods.Contact details for further information:Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS, ihrynaszkiewicz@plos.org / plos@plos.orgLauren Cadwallader, Open Research Manager, PLOS, lcadwallader@plos.org / plos@plos.orgAcknowledgements:Thanks to Allegra Pearce, Tim Vines, Asura Enkhbayar, Scott Kerr and parth sarin of DataSeer for contributing to data acquisition and supporting information.
m
World’s Top 2% of Scientists list by Stanford University: An Analysis of its...
data.mendeley.com
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JOHN Philip (2023). World’s Top 2% of Scientists list by Stanford University: An Analysis of its Robustness [Dataset]. http://doi.org/10.17632/td6tdp4m6t.1
Explore at:
Unique identifier
https://doi.org/10.17632/td6tdp4m6t.1
Dataset updated
Nov 17, 2023
Authors
JOHN Philip
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
John Ioannidis and co-authors [1] created a publicly available database of top-cited scientists in the world. This database, intended to address the misuse of citation metrics, has generated a lot of interest among the scientific community, institutions, and media. Many institutions used this as a yardstick to assess the quality of researchers. At the same time, some people look at this list with skepticism citing problems with the methodology used. Two separate databases are created based on career-long and, single recent year impact. This database is created using Scopus data from Elsevier[1-3]. The Scientists included in this database are classified into 22 scientific fields and 174 sub-fields. The parameters considered for this analysis are total citations from 1996 to 2022 (nc9622), h index in 2022 (h22), c-score, and world rank based on c-score (Rank ns). Citations without self-cites are considered in all cases (indicated as ns). In the case of a single-year case, citations during 2022 (nc2222) instead of Nc9622 are considered.

To evaluate the robustness of c-score-based ranking, I have done a detailed analysis of the matrix parameters of the last 25 years (1998-2022) of Nobel laureates of Physics, chemistry, and medicine, and compared them with the top 100 rank holders in the list. The latest career-long and single-year-based databases (2022) were used for this analysis. The details of the analysis are presented below: Though the article says the selection is based on the top 100,000 scientists by c-score (with and without self-citations) or a percentile rank of 2% or above in the sub-field, the actual career-based ranking list has 204644 names[1]. The single-year database contains 210199 names. So, the list published contains ~ the top 4% of scientists. In the career-based rank list, for the person with the lowest rank of 4809825, the nc9622, h22, and c-score were 41, 3, and 1.3632, respectively. Whereas for the person with the No.1 rank in the list, the nc9622, h22, and c-score were 345061, 264, and 5.5927, respectively. Three people on the list had less than 100 citations during 96-2022, 1155 people had an h22 less than 10, and 6 people had a C-score less than 2.
In the single year-based rank list, for the person with the lowest rank (6547764), the nc2222, h22, and c-score were 1, 1, and 0. 6, respectively. Whereas for the person with the No.1 rank, the nc9622, h22, and c-score were 34582, 68, and 5.3368, respectively. 4463 people on the list had less than 100 citations in 2022, 71512 people had an h22 less than 10, and 313 people had a C-score less than 2. The entry of many authors having single digit H index and a very meager total number of citations indicates serious shortcomings of the c-score-based ranking methodology. These results indicate shortcomings in the ranking methodology.
d
Data for: Integrating open education practices with data analysis of open...
search.dataone.org
data.niaid.nih.gov
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marja Bakermans (2024). Data for: Integrating open education practices with data analysis of open science in an undergraduate course [Dataset]. http://doi.org/10.5061/dryad.37pvmcvst
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.37pvmcvst
Dataset updated
Jul 27, 2024
Dataset provided by
Dryad Digital Repository
Authors
Marja Bakermans
Description
The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a..., Article and dataset fairness To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored â€˜1â€™ or â€˜0â€™ based on whether it met that criteria, with a total possible score of ten. Open grading policies Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of ..., , # Data for: Integrating open education practices with data analysis of open science in an undergraduate course

Author: Marja H Bakermans Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA ORCID: https://orcid.org/0000-0002-4879-7771 Institutional IRB approval: IRB-24â€“0314

Data and file overview

The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files. Below are descriptions of the name and contents of each file. NA = not applicable or no data available

BestPracticesData.csv

Description: Data to assess the adherence of articles and datasets to open science best practices.

Column headers and descriptions:

Article: articles used in the study, numbered randomly

F1: Findable, Data are assigned a unique and persistent doi

F2: Findable, Metadata includes an identifier of data

F3: Findable, Data are registered in a searchable database

A1: ...
Dataset for MONITORING IMPLEMENTATION OF THE SCIENTIFIC INFORMATION POLICY...
zenodo.org
bin, csv
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven Crawford; Steven Crawford; Rachel Paseka; Rachel Paseka; Rebecca Lynn Michelson; Rebecca Lynn Michelson (2025). Dataset for MONITORING IMPLEMENTATION OF THE SCIENTIFIC INFORMATION POLICY FOR THE NASA SCIENCE MISSION DIRECTORATE (SPD-41A) METRICS REPORT [Dataset]. http://doi.org/10.5281/zenodo.15588135
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15588135
Dataset updated
Jun 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Steven Crawford; Steven Crawford; Rachel Paseka; Rachel Paseka; Rebecca Lynn Michelson; Rebecca Lynn Michelson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The NASA Science Information Policy for the Science Mission Directorate (SPD-41a) provides requirements for how scientific information produced from SMD funded scientific activities must be shared. SPD-41a requirements for research awards were incorporated into SMD’s Research Opportunities in Space and Earth Science (ROSES) starting with the ROSES-2023 solicitation. This dataset and notebooks include analysis of publications from 2023. This incudes: 1) analysis of publications for the percentage of publications that are open access 2) sampling of publications that make the data and software openly available.

This data is a companion to the full report that presents the results of the analysis.
Z
Data of "Data sharing of computer scientists: an analysis of current...
data.niaid.nih.gov
Updated Mar 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Under review) (2022). Data of "Data sharing of computer scientists: an analysis of current research information system data" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4736881
Explore at:
Dataset updated
Mar 22, 2022
Dataset authored and provided by
(Under review)
Description
This study describes a methodology where departmental academic publications are used to analyse the ways in which computer scientists share research data.

Without sufficient information about researchers’ data sharing, there is a risk of mismatching FAIR data service efforts with the needs of researchers. This study describes a methodology where departmental academic publications are used to analyse the ways in which computer scientists share research data. The advancement of FAIR data would benefit from novel methodologies that reliably examine data sharing at the level of multidisciplinary research organisations. Studies that use CRIS publication data to elicit insight into researchers’ data sharing may therefore be a valuable addition to the current interview and questionnaire methodologies.

Data was collected from the following sources:

All journal articles published by researchers in the computer science department of the case study’s university during 2019 were extracted for scrutiny from the current research information system. For these 193 articles, a coding framework was developed to capture the key elements of acquiring and sharing research data. Article DOIs are included in the research data.

The scientific journal articles and theirs DOIs are used in this study for the purpose of academic expression.

The raw data is compiled into a single CSV file. Rows represent specific articles and columns are the values of the data points described below. Author names and affiliations were not collected and are not included in the data set. Please, contact the author for access to the data.

The following data points were used in the analysis:

Data points

Main study types

Literature-based study (e.g. literature reviews, archive studies, studies of social media)

yes/no

Novel computational methods (e.g. algorithms, simulations, software)

yes/no

Interaction studies (e.g, interviews, surveys, tasks, ethnography)

yes/no

Intervention studies (e.g., EEG, MRI, clinical trials)

yes/no

Measurement studies (e.g. astronomy, weather, acoustics, chemistry)

yes/no

Life sciences (e.g. “omics”, ecology)

yes/no

Data acquisition

Article presents a data availability statement

yes/no

Article does not utilise data

yes/no

Original data was collected

yes/no

Open data from prior studies were used

yes/no

Open data from public authorities, companies, universities and associations

yes/no

Data sharing

Article does not use original data

yes/no

Data of the article is not available for reuse

yes/no

Article used openly available data

yes/no

Authors agree to share their data to interested readers

yes/no

Article shared data (or part of) as supplementary material

yes/no

Article shared data (or part of) via open deposition

yes/no

Article deposited code or used open code

yes/no
m
Percentage of articles and reviews by countries in sources discontinued by...
data.mendeley.com
narcis.nl
Updated Sep 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bulat Kenessov (2020). Percentage of articles and reviews by countries in sources discontinued by Scopus [Dataset]. http://doi.org/10.17632/k8pvz45gp2.1
Explore at:
Unique identifier
https://doi.org/10.17632/k8pvz45gp2.1
Dataset updated
Sep 21, 2020
Authors
Bulat Kenessov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is important for estimating the problems with academic policy and quality control in some countries, which result in an excessive (compared to other countries) percentage of publications in questionable journals not providing a proper peer review and/or violating research ethics. Such journals are regularly discontinued from Scopus.
4
Results from a Systematic Literature Review concerning drivers and...
data.4tu.nl
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anneke Zuiderwijk; Rhythima Shinde; Wei Jeng, Results from a Systematic Literature Review concerning drivers and inhibitors of researchers to openly share research data and to use open research data [Dataset]. http://doi.org/10.4121/12820631.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/12820631.v1
Dataset provided by
4TU.ResearchData
Authors
Anneke Zuiderwijk; Rhythima Shinde; Wei Jeng
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 2004 - Jul 2020
Description
This is the dataset underlying the research article, “What drives and inhibits researchers to share and use open research data? A systematic literature review to analyze factors influencing open research data adoption”. It provides main information concerning the articles identified through the systematic literature review applied in this study, as well as detailed information concerning the 32 studies selected for the literature review. Furthermore, the file provides information derived from the description and analysis of the selected studies. More information can be found in the README file.
d
Replication Data for: Data policies of highly-ranked social science journals...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crosas, Mercè; Gautier, Julian; Karcher, Sebastian; Kirilova, Dessi; Otalora, Gerard; Schwartz, Abby (2023). Replication Data for: Data policies of highly-ranked social science journals [Dataset]. http://doi.org/10.7910/DVN/CZYY1N
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/CZYY1N
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Crosas, Mercè; Gautier, Julian; Karcher, Sebastian; Kirilova, Dessi; Otalora, Gerard; Schwartz, Abby
Time period covered
Jan 1, 2003 - Dec 12, 2017
Description
By encouraging and requiring that authors share their data in order to publish articles, scholarly journals have become an important actor in the movement to improve the openness of data and the reproducibility of research. But how many social science journals encourage or mandate that authors share the data supporting their research findings? How does the share of journal data policies vary by discipline? What influences these journals’ decisions to adopt such policies and instructions? And what do those policies and instructions look like? We discuss the results of our analysis of the instructions and policies of 291 highly-ranked journals publishing social science research, where we studied the contents of journal data policies and instructions across 14 variables, such as when and how authors are asked to share their data, and what role journal ranking and age play in the existence and quality of data policies and instructions. We also attempt to compare our results to the results of other studies that have analyzed the policies of social science journals, although differences in the journals chosen and how each study defines what constitutes a data policy limit this comparison. We conclude that a little more than half of the journals in our study have data policies. A greater share of the economics journals have data policies and mandate sharing, followed by political science/international relations and psychology journals. Finally, we use our findings to make several recommendations: Policies should include the terms “data”, “dataset” or more specific terms that make it clear what to make available; policies should include the benefits of data sharing; journals, publishers, and associations need to collaborate more to clarify data policies; and policies should explicitly ask for qualitative data.
r
Philosophy Journal Survey data
researchdata.edu.au
bridges.monash.edu
Updated Sep 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toby Handfield (2022). Philosophy Journal Survey data [Dataset]. http://doi.org/10.26180/20499582.v1
Explore at:
Unique identifier
https://doi.org/10.26180/20499582.v1
Dataset updated
Sep 12, 2022
Dataset provided by
Monash University
Authors
Toby Handfield
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author survey of experiences with philosophy journals; obtained from the APA website, as at 17 August 2022. Summary data only.

"the Blog of the APA is initiating an ongoing project to survey the experiences scholars have had with academic journals. The data collected reflects the quality of the peer review process, and includes information on average review time, time to publication, acceptance rates, comments per submission, and overall experience with a wide variety of academic journals."

Source: https://archive.ph/wihtZ
Z
Science Education Research Topic Modeling Dataset
data.niaid.nih.gov
zenodo.org
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rudolph, John L. (2024). Science Education Research Topic Modeling Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4094973
Explore at:
Dataset updated
Oct 9, 2024
Dataset provided by
Rudolph, John L.
Marin, Alessandro
Odden, Tor Ole B.
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset contains scraped and processed text from roughly 100 years of articles published in the Wiley journal Science Education (formerly General Science Quarterly). This text has been cleaned and filtered in preparation for analysis using natural language processing techniques, particularly topic modeling with latent Dirichlet allocation (LDA). We also include a Jupyter Notebook illustrating how one can use LDA to analyze this dataset and extract latent topics from it, as well as analyze the rise and fall of those topics over the history of the journal.

The articles were downloaded and scraped in December of 2019. Only non-duplicate articles with a listed author (according to the CrossRef metadata database) were included, and due to missing data and text recognition issues we excluded all articles published prior to 1922. This resulted in 5577 articles in total being included in the dataset. The text of these articles was then cleaned in the following way:

We removed duplicated text from each article: prior to 1969, articles in the journal were published in a magazine format in which the end of one article and the beginning of the next would share the same page, so we developed an automated detection of article beginnings and endings that was able to remove any duplicate text.

We removed the reference sections of the articles, as well headings (in all caps) such as “ABSTRACT”.

We reunited any partial words that were separated due to line breaks, text recognition issues, or British vs. American spellings (for example converting “per cent” to “percent”)

We removed all numbers, symbols, special characters, and punctuation, and lowercased all words.

We removed all stop words, which are words without any semantic meaning on their own—“the”, “in,” “if”, “and”, “but”, etc.—and all single-letter words.

We lemmatized all words, with the added step of including a part-of-speech tagger so our algorithm would only aggregate and lemmatize words from the same part of speech (e.g., nouns vs. verbs).

We detected and create bi-grams, sets of words that frequently co-occur and carry additional meaning together. These words were combined with an underscore: for example, “problem_solving” and “high_school”.

After filtering, each document was then turned into a list of individual words (or tokens) which were then collected and saved (using the python pickle format) into the file scied_words_bigrams_V5.pkl.

In addition to this file, we have also included the following files:

SciEd_paper_names_weights.pkl: A file containing limited metadata (title, author, year published, and DOI) for each of the papers, in the same order as they appear within the main datafile. This file also includes the weights assigned by an LDA model used to analyze the data

Science Education LDA Notebook.ipynb: A notebook file that replicates our LDA analysis, with a written explanation of all of the steps and suggestions on how to explore the results.

Supporting files for the notebook. These include the requirements, the README, a helper script with functions for plotting that were too long to include in the notebook, and two HTML graphs that are embedded into the notebook.

This dataset is shared under the terms of the Wiley Text and Data Mining Agreement, which allows users to share text and data mining output for non-commercial research purposes. Any questions or comments can be directed to Tor Ole Odden, t.o.odden@fys.uio.no.
Dataset for article "Unveiling Openness in Energy Research: A Bibliometric...
zenodo.org
csv
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linna Lu; Linna Lu; Amanda Wein; Amanda Wein (2025). Dataset for article "Unveiling Openness in Energy Research: A Bibliometric Analysis Focusing on Open Access and Data Sharing Practices" [Dataset]. http://doi.org/10.5281/zenodo.15023865
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15023865
Dataset updated
Mar 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Linna Lu; Linna Lu; Amanda Wein; Amanda Wein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 6, 2024
Description
This dataset was used as a data corpus for a bibliometric analysis with the title "Unveiling Openness in Energy Research: A Bibliometric Analysis Focusing on Open Access and Data Sharing Practices".

The CSV file (2024-12-06_OpenAlex_API_download_works_Energy_Germany_(2013-2023)) was collected on December 6^th, 2024, by using the OpenAlex API and search criteria: OpenAlex field "Energy", continent “Europe”, country “Germany”, and publication years 2013 – 2023. Based on this file, two sample files were extracted - one by subfield (2024-12-06_OpenAlex_API_dwonload_works_Energy_Germany_(2013-2023)_sampled_by_subfield) and another by year group (2024-12-06_OpenAlex_API_download_works_Energy_Germany_(2013-2023)_sampled_by_year_group).

This dataset was collected and used to answer the following research questions:

- What percentage of energy research publications are OA? How do the types (gold, green, etc.) of these publications differ?

- Are there notable differences in OA and data sharing practices in different subfields of energy research?

- How commonly are datasets for energy studies shared? What are the primary repositories used?

- What kind of data sharing or publication practices are widespread? How has this evolved over the last decade?
Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to...
plos.figshare.com
figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
B. Ian Hutchins; Xin Yuan; James M. Anderson; George M. Santangelo (2023). Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to Measure Influence at the Article Level [Dataset]. http://doi.org/10.1371/journal.pbio.1002541
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002541
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
B. Ian Hutchins; Xin Yuan; James M. Anderson; George M. Santangelo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite their recognized limitations, bibliometric assessments of scientific productivity have been widely adopted. We describe here an improved method to quantify the influence of a research article by making novel use of its co-citation network to field-normalize the number of citations it has received. Article citation rates are divided by an expected citation rate that is derived from performance of articles in the same field and benchmarked to a peer comparison group. The resulting Relative Citation Ratio is article level and field independent and provides an alternative to the invalid practice of using journal impact factors to identify influential papers. To illustrate one application of our method, we analyzed 88,835 articles published between 2003 and 2010 and found that the National Institutes of Health awardees who authored those papers occupy relatively stable positions of influence across all disciplines. We demonstrate that the values generated by this method strongly correlate with the opinions of subject matter experts in biomedical research and suggest that the same approach should be generally applicable to articles published in all areas of science. A beta version of iCite, our web tool for calculating Relative Citation Ratios of articles listed in PubMed, is available at https://icite.od.nih.gov.
A study of the impact of data sharing on article citations using journal...
plos.figshare.com
dataverse.harvard.edu
+1more
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose (2023). A study of the impact of data sharing on article citations using journal policies as a natural experiment [Dataset]. http://doi.org/10.1371/journal.pone.0225883
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0225883
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Garret Christensen; Allan Dafoe; Edward Miguel; Don A. Moore; Andrew K. Rose
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study estimates the effect of data sharing on the citations of academic articles, using journal policies as a natural experiment. We begin by examining 17 high-impact journals that have adopted the requirement that data from published articles be publicly posted. We match these 17 journals to 13 journals without policy changes and find that empirical articles published just before their change in editorial policy have citation rates with no statistically significant difference from those published shortly after the shift. We then ask whether this null result stems from poor compliance with data sharing policies, and use the data sharing policy changes as instrumental variables to examine more closely two leading journals in economics and political science with relatively strong enforcement of new data policies. We find that articles that make their data available receive 97 additional citations (estimate standard error of 34). We conclude that: a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.
S1 Data -
plos.figshare.com
xlsx
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elina Late; Michael Ochsner (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0303190.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303190.s001
Dataset updated
May 10, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Elina Late; Michael Ochsner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The aim of this paper is to investigate the re-use of research data deposited in digital data archive in the social sciences. The study examines the quantity, type, and purpose of data downloads by analyzing enriched user log data collected from Swiss data archive. The findings show that quantitative datasets are downloaded increasingly from the digital archive and that downloads focus heavily on a small share of the datasets. The most frequently downloaded datasets are survey datasets collected by research organizations offering possibilities for longitudinal studies. Users typically download only one dataset, but a group of heavy downloaders form a remarkable share of all downloads. The main user group downloading data from the archive are students who use the data in their studies. Furthermore, datasets downloaded for research purposes often, but not always, serve to be used in scholarly publications. Enriched log data from data archives offer an interesting macro level perspective on the use and users of the services and help understanding the increasing role of repositories in the social sciences. The study provides insights into the potential of collecting and using log data for studying and evaluating data archive use.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Isabel Basson; Marc-André Simard; Vincent Larivière (2022). Number of total publications and percentage of open access publications for Dimensions and WoS, by country, 2015-2019 [Dataset]. http://doi.org/10.6084/m9.figshare.18319238.v1

Number of total publications and percentage of open access publications for Dimensions and WoS, by country, 2015-2019

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.18319238.v1

Dataset updated

Jan 31, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

Isabel Basson; Marc-André Simard; Vincent Larivière

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This is the underlying dataset used for the country analysis regarding the percentage of papers in Dimensions and Web of Science (WoS), published between 2015 and 2019 that are open access (OA), regardless of mode of OA.A paper was assigned a country affiliation based on the affiliation of the first author of a paper, thus each paper is only counted once, regardless whether the paper had multiple coauthors.Each row represents the data for a country. A country only appears once (i.e., each row is unique).Column headings:iso_alpha_2 = the ISO alpha 2 country code of the countrycountry = the name of the country as stated either in Dimensions or WoS.world_bank_region_2021 = pub_wos = total number of papers (document type articles and reviews) indexed in WoS, published from 2015 to 2019oa_pers_wos = Percentage of pub_wos that are OApub_dim = total number of papers (document type journal articles) indexed in Dimensions, published from 2015 to 2019oa_pers_dim = Percentage of pub_dim that are OArelative_diff = the relative difference between oa_pers_dim and oa_pers_wos using the following equation: ((x-y))/((x+y) ), with x representing the percentage of papers for the country in the Dimensions dataset that are OA, and y representing the percentage of papers for the country in the WoS dataset that are OA. In cases of "N/A" in a cell, a division by 0 occurred.Data availabilityRestriction apply to both datasets used to generate the aggregate data. The Web of Science data is owned by Clarivate Analytics. To obtain the bibliometric data in the same manner as authors (i.e. by purchasing them), readers can contact Clarivate Analytics at the following URL: https://clarivate.com/webofsciencegroup/solutions/web-of-science/contact-us/. The Dimensions data is owned by Digital Science, which has a programme that provides no cost access to its data. It can be accessed at: https://dimensions.ai/data_access.

Clear search

Close search

Google apps

Main menu

Number of total publications and percentage of open access publications for...

Public Availability of Published Research Data in High-Impact Journals

Data Availability Statements in the 2020 and 2021 scientific publications of...

PLOS Open Science Indicators

World’s Top 2% of Scientists list by Stanford University: An Analysis of its...

Data for: Integrating open education practices with data analysis of open...

Data and file overview

Dataset for MONITORING IMPLEMENTATION OF THE SCIENTIFIC INFORMATION POLICY...

Data of "Data sharing of computer scientists: an analysis of current...

Percentage of articles and reviews by countries in sources discontinued by...

Results from a Systematic Literature Review concerning drivers and...

Replication Data for: Data policies of highly-ranked social science journals...

Philosophy Journal Survey data

Science Education Research Topic Modeling Dataset

Dataset for article "Unveiling Openness in Energy Research: A Bibliometric...

Relative Citation Ratio (RCR): A New Metric That Uses Citation Rates to...

A study of the impact of data sharing on article citations using journal...

S1 Data -

Number of total publications and percentage of open access publications for Dimensions and WoS, by country, 2015-2019