100+ datasets found

Z
Softcite Dataset: A dataset of software mentions in research publications
data.niaid.nih.gov
zenodo.org
Updated Jan 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caifan Du (2021). Softcite Dataset: A dataset of software mentions in research publications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4444074
Explore at:
Dataset updated
Jan 17, 2021
Dataset provided by
Caifan Du
James Howison
Hannah Cohoon
Patrice Lopez
Description
The Softcite dataset is a gold-standard dataset of software mentions in research publications, a free resource primarily for software entity recognition in scholarly text. This is the first release of this dataset.

What's in the dataset

With the aim of facilitating software entity recognition efforts at scale and eventually increased visibility of research software for the due credit of software contributions to scholarly research, a team of trained annotators from Howison Lab at the University of Texas at Austin annotated 4,093 software mentions in 4,971 open access research publications in biomedicine (from PubMed Central Open Access collection) and economics (from Unpaywall open access services). The annotated software mentions, along with their publisher, version, and access URL, if mentioned in the text, as well as those publications annotated as containing no software mentions, are all included in the released dataset as a TEI/XML corpus file.

For understanding the schema of the Softcite corpus, its design considerations, and provenance, please refer to our paper included in this release (preprint version).

Use scenarios

The release of the Softcite dataset is intended to encourage researchers and stakeholders to make research software more visible in science, especially to academic databases and systems of information retrieval; and facilitate interoperability and collaboration among similar and relevant efforts in software entity recognition and building utilities for software information retrieval. This dataset can also be useful for researchers investigating software use in academic research.

Current release content

softcite-dataset v1.0 release includes:

The Softcite dataset corpus file: softcite_corpus-full.tei.xml

Softcite Dataset: A Dataset of Software Mentions in Biomedical and Economic Research Publications, our paper that describes the design consideration and creation process of the dataset: Softcite_Dataset_Description_RC.pdf. (This is a preprint version of our forthcoming publication in the Journal of the Association for Information Science and Technology.)

The Softcite dataset is licensed under a Creative Commons Attribution 4.0 International License.

If you have questions, please start a discussion or issue in the howisonlab/softcite-dataset Github repository.
o
Catalogue of natural resource scientific and technical publications
data.ontario.ca
csv
Updated Apr 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Catalogue of natural resource scientific and technical publications [Dataset]. https://data.ontario.ca/dataset/catalogue-of-natural-resource-scientific-and-technical-publications
Explore at:
csv(572080)Available download formats
Dataset updated
Apr 30, 2025
License
https://www.ontario.ca/page/copyright-informationhttps://www.ontario.ca/page/copyright-information
Time period covered
Apr 30, 2025
Area covered
Ontario
Description
Search a list of the scientific and technical publications issued since 2004. Email us to request a publication.

This catalogue is designed for users of natural resources scientific and technical information, such as researchers with universities and other governments, foresters and biologists, and conservation authority staff. It provides a complete list of Ontario’s scientific and technical publications on natural resources and forestry issued since 2004, including:

Scientific and technical publications commissioned by the Province

Journal articles written by ministry staff
Early Indicator for Data Sharing and Reuse - Supplementary Tables.xlsx
figshare.com
xlsx
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Piekniewska; Laurel Haak; Darla Henderson; Katherine McNeill; Anita Bandrowski; Yvette Seger (2023). Early Indicator for Data Sharing and Reuse - Supplementary Tables.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.22720399.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22720399.v1
Dataset updated
Apr 28, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Agata Piekniewska; Laurel Haak; Darla Henderson; Katherine McNeill; Anita Bandrowski; Yvette Seger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These data were generated for an investigation of research data repository (RDR) mentions in biuomedical research articles.

Supplementary Table 1 is a discrete subset of SciCrunch RDRs used to study RDR mentions in biomedical literature. We generated this list by starting with the top 1000 entries in the SciCrunch database, measured by citations, removed entries for organizations (such as universities without a corresponding RDR) or non-relevant tools (such as reference managers), updated links, and consolidated duplicates resulting from RDR mergers and name variations. The resulting list of 737 RDRs is shown in with as a base based on a source list of RDRs in the SciCrunch database. The file includes the Research Resource Identifier (RRID), the RDR name, and a link to the RDR record in the SciCrunch database.

Supplementary Table 2 shows the RDRs, associated journals, and article-mention pairs (records) with text snippets extracted from mined Methods text in 2020 PubMed articles. The dataset has 4 components. The first shows the list of repositories with RDR mentions, and includes the Research Resource Identifier (RRID), the RDR name, the number of articles that mention the RDR, and a link to the record in the SciCrunch database. The second shows the list of journals in the study set with at least 1 RDR mention, andincludes the Journal ID, nam, ESSN/ISSN, the total count of publications in 2020, the number of articles that had text available to mine, the number of article-mention pairs (records), number of articles with RDR mentions, the number of unique RDRs mentioned, % of articles with minable text. The third shows the top 200 journals by RDR mention, normalized by the proportion of articles with available text to mine, with the same metadata as the second table. The fourth shows text snippets for each RDR mention, and includes the RRID, RDR name, PubMedID (PMID), DOI, article publication date, journal name, journal ID, ESSN/ISSN, article title, and snippet.
Z
Data from: A comprehensive dataset of the Spanish research output and its...
data.niaid.nih.gov
produccioncientifica.ugr.es
+1more
Updated Dec 2, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arroyo-Machado, Wenceslao (2022). A comprehensive dataset of the Spanish research output and its associated social media and altmetric mentions (2016-2020) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6184380
Explore at:
Dataset updated
Dec 2, 2022
Dataset provided by
Arroyo-Machado, Wenceslao
Torres-Salinas, Daniel
Robinson-Garcia, Nicolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain
Description
Data on research publications authored by Spanish institutions between 2016 and 2020 with their associated social media and altmetric mentions, and on researchers affiliated to Spanish institutions whose work is highly mentioned in social media and non-academic outlets.

Variables of the publications dataset:

id - Unique publication identifier

title - Full title of the publication

year - Year of publication

type - Document type

journal - Name of the journal

esi - ESI category of the publication

influratio - AAS value on March 3, 2021

news - Number of mentions in news media

blogs - Number of mentions in blogs

policy - Number of mentions in policy reports

patent - Number of mentions in patent

twitter - Number of mentions in Twitter

post_peer - Number of mentions in PubPeer and Publons

weibo - Number of mentions in Weibo

facebook - Number of mentions in Facebook

wikipedia - Number of mentions in Wikipedia

google - Number of mentions in Google+

linkedin - Number of mentions in LinkedIn

reddit - Number of mentions in Reddit

pinterest - Number of mentions in Pinterest

f1000 - Number of mentions in F1000

stack_overflow - Number of mentions in Stack Overflow

youtube - Number of mentions in YouTube

syllabus - Number of mentions in Open Syllabus Project

Variables of the top authors dataset:

name - Full name of the researcher

orcid - ORCID record

organization - Name of the institution of affiliation

publications - List of publication identifiers (id) connecting with the publications dataset
u
Data from: Inventory of online public databases and repositories holding...
agdatacommons.nal.usda.gov
datadiscoverystudio.org
+2more
txt
Updated Feb 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1389839
Dataset updated
Feb 8, 2024
Dataset provided by
Ag Data Commons
Authors
Erin Antognoli; Jonathan Sears; Cynthia Parr
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
w
Dataset of books about Research-Computer network resources
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books about Research-Computer network resources [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_subject&fop0=%3D&fval0=Research-Computer+network+resources&j=1&j0=book_subjects
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 11 rows and is filtered where the book subjects is Research-Computer network resources. It features 9 columns including author, publication date, language, and book publisher.
c
Data from: Linking full-text grey literature to underlying research and...
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D. Farace; C. Stock; J. Frantzen; L. Sesink; D.L. Rabina; GreyNet - Grey Literature Network Service (2023). Linking full-text grey literature to underlying research and post-publication data: An Enhanced Publications Project 2011-2012 [Dataset]. http://doi.org/10.17026/dans-zca-t9k5
Explore at:
Unique identifier
https://doi.org/10.17026/dans-zca-t9k5
Dataset updated
Apr 11, 2023
Dataset provided by
INIST-CNRS
DANS - Data Archiving and Networked Services
Pratt Institute
Authors
D. Farace; C. Stock; J. Frantzen; L. Sesink; D.L. Rabina; GreyNet - Grey Literature Network Service
Description
1. The Project
This project seeks to circumvent the data vs. documents camp in the grey literature community by way of a middle ground provided through enhanced publications. Enhanced publications allow for a fuller understanding of the process in which data and information are used and applied in the generation of knowledge. The enhanced publication of grey literature precludes the idea of a random selection of data and information, and instead focuses on the human intervention in data-rich environments. The definition of an enhanced publication is borrowed from the DRIVER-II project, “a publication that is enhanced with three categories of information: research data, extra materials, and post-publication data”. Enhanced publications combine textual resources i.e. documents intended to be read by human beings, which contain an interpretation or analysis of primary data. Enhanced publications inherently contribute to the review process of grey literature as well as the replication of research and improved visibility of research results in the scholarly communication chain.
2. Design of the Questionnaire and Author Survey
The population of the survey was selected from among the 286 authors and co-authors in the International Conference Series on Grey Literature. It was decided that only first authors would receive the questionnaire, which narrowed the potential population of the survey to 162 authors of which only 95 were actually sent the online questionnaire. The reason the other 67 first authors were not included in the final survey population was due to a number of factors such as no current email address, retired, deceased, etc. The 95 authors were sent a personalized email with a standardized text inviting them to participate in the survey by completing the online questionnaire. The survey was carried out using the freeware ‘Survey Monkey’ and the questionnaire contained 10 items, three of which were open-ended. Subheadings were also inserted in the questionnaire set off by quotation marks. These subheadings preceded each odd numbered question and were deemed relevant in achieving informed responses. The final results are based on the response of 50 of the 95 survey recipients, which amounts to roughly a 53% response rate.
l
Data from: Where do engineering students really get their information? :...
opal.latrobe.edu.au
researchdata.edu.au
pdf
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clayton Bolitho (2025). Where do engineering students really get their information? : using reference list analysis to improve information literacy programs [Dataset]. http://doi.org/10.4225/22/59d45f4b696e4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.4225/22/59d45f4b696e4
Dataset updated
Mar 13, 2025
Dataset provided by
La Trobe
Authors
Clayton Bolitho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundAn understanding of the resources which engineering students use to write their academic papers provides information about student behaviour as well as the effectiveness of information literacy programs designed for engineering students. One of the most informative sources of information which can be used to determine the nature of the material that students use is the bibliography at the end of the students’ papers. While reference list analysis has been utilised in other disciplines, few studies have focussed on engineering students or used the results to improve the effectiveness of information literacy programs. Gadd, Baldwin and Norris (2010) found that civil engineering students undertaking a finalyear research project cited journal articles more than other types of material, followed by books and reports, with web sites ranked fourth. Several studies, however, have shown that in their first year at least, most students prefer to use Internet search engines (Ellis & Salisbury, 2004; Wilkes & Gurney, 2009).PURPOSEThe aim of this study was to find out exactly what resources undergraduate students studying civil engineering at La Trobe University were using, and in particular, the extent to which students were utilising the scholarly resources paid for by the library. A secondary purpose of the research was to ascertain whether information literacy sessions delivered to those students had any influence on the resources used, and to investigate ways in which the information literacy component of the unit can be improved to encourage students to make better use of the resources purchased by the Library to support their research.DESIGN/METHODThe study examined student bibliographies for three civil engineering group projects at the Bendigo Campus of La Trobe University over a two-year period, including two first-year units (CIV1EP – Engineering Practice) and one-second year unit (CIV2GR – Engineering Group Research). All units included a mandatory library session at the start of the project where student groups were required to meet with the relevant faculty librarian for guidance. In each case, the Faculty Librarian highlighted specific resources relevant to the topic, including books, e-books, video recordings, websites and internet documents. The students were also shown tips for searching the Library catalogue, Google Scholar, LibSearch (the LTU Library’s research and discovery tool) and ProQuest Central. Subject-specific databases for civil engineering and science were also referred to. After the final reports for each project had been submitted and assessed, the Faculty Librarian contacted the lecturer responsible for the unit, requesting copies of the student bibliographies for each group. References for each bibliography were then entered into EndNote. The Faculty Librarian grouped them according to various facets, including the name of the unit and the group within the unit; the material type of the item being referenced; and whether the item required a Library subscription to access it. A total of 58 references were collated for the 2010 CIV1EP unit; 237 references for the 2010 CIV2GR unit; and 225 references for the 2011 CIV1EP unit.INTERIM FINDINGSThe initial findings showed that student bibliographies for the three group projects were primarily made up of freely available internet resources which required no library subscription. For the 2010 CIV1EP unit, all 58 resources used were freely available on the Internet. For the 2011 CIV1EP unit, 28 of the 225 resources used (12.44%) required a Library subscription or purchase for access, while the second-year students (CIV2GR) used a greater variety of resources, with 71 of the 237 resources used (29.96%) requiring a Library subscription or purchase for access. The results suggest that the library sessions had little or no influence on the 2010 CIV1EP group, but the sessions may have assisted students in the 2011 CIV1EP and 2010 CIV2GR groups to find books, journal articles and conference papers, which were all represented in their bibliographiesFURTHER RESEARCHThe next step in the research is to investigate ways to increase the representation of scholarly references (found by resources other than Google) in student bibliographies. It is anticipated that such a change would lead to an overall improvement in the quality of the student papers. One way of achieving this would be to make it mandatory for students to include a specified number of journal articles, conference papers, or scholarly books in their bibliographies. It is also anticipated that embedding La Trobe University’s Inquiry/Research Quiz (IRQ) using a constructively aligned approach will further enhance the students’ research skills and increase their ability to find suitable scholarly material which relates to their topic. This has already been done successfully (Salisbury, Yager, & Kirkman, 2012)CONCLUSIONS & CHALLENGESThe study shows that most students rely heavily on the free Internet for information. Students don’t naturally use Library databases or scholarly resources such as Google Scholar to find information, without encouragement from their teachers, tutors and/or librarians. It is acknowledged that the use of scholarly resources doesn’t automatically lead to a high quality paper. Resources must be used appropriately and students also need to have the skills to identify and synthesise key findings in the existing literature and relate these to their own paper. Ideally, students should be able to see the benefit of using scholarly resources in their papers, and continue to seek these out even when it’s not a specific assessment requirement, though it can’t be assumed that this will be the outcome.REFERENCESEllis, J., & Salisbury, F. (2004). Information literacy milestones: building upon the prior knowledge of first-year students. Australian Library Journal, 53(4), 383-396.Gadd, E., Baldwin, A., & Norris, M. (2010). The citation behaviour of civil engineering students. Journal of Information Literacy, 4(2), 37-49.Salisbury, F., Yager, Z., & Kirkman, L. (2012). Embedding Inquiry/Research: Moving from a minimalist model to constructive alignment. Paper presented at the 15th International First Year in Higher Education Conference, Brisbane. Retrieved from http://www.fyhe.com.au/past_papers/papers12/Papers/11A.pdfWilkes, J., & Gurney, L. J. (2009). Perceptions and applications of information literacy by first year applied science students. Australian Academic & Research Libraries, 40(3), 159-171.
Disambiguated researchers publication data
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amaral Lab (2023). Disambiguated researchers publication data [Dataset]. http://doi.org/10.6084/m9.figshare.1591864.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1591864.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Amaral Lab
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Companion dataset to "The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact" by Duch J, Zeng XHT, Sales-Pardo M, Radicchi F, Otis S, Woodruff TK, Amaral LAN (PLoS ONE 7, e51332, 2012) doi: 10.1371/journal.pone.0051332 This dataset lists the total number of publications by 4,394 faculty members from 7 distinct research fields working at top U.S. institutions. The dataset also contains bibliographic information manualy gathered from the CVs of those faculty members. The publications data was collected from Thomson Reuters' Web of Science according to the procedures described in the published paper.

The data is a single csv file with the following fields: author_name - researcher name as: Last name, Initialsgender - researcher gender as: M (male) or F (female)univ_name - Institution of current employmentfield - scientific disciplinephd_year - year of phd completionnationality - Country of originbackground - List of degreesaffiliations - List of honours and past appointmentstotal_pubs - Total number of publications Some fields are not available for some researchers. Current employments are accurate as of June, 2010.total_pubs field show total number of publications published by the end of 2010.
Z
Data from: List of data journals
data.niaid.nih.gov
zenodo.org
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Strecker, Dorothea (2024). List of data journals [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7082125
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Kindling, Maxi
Strecker, Dorothea
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This document describes a dataset that aggregates information about 135 data journals. Data journals focus on the publication of data papers -- a specialized publication type describing datasets, their collection and reuse potential that is peer-reviewed, citable and indexed. This dataset includes a comprehensive list of data journals that was compiled by aggregating existing sources, as well as an overview of these sources.

The list is continually updated on GitHub, where additional information on data journals (URLs of data journal homepages) is provided: https://github.com/MaxiKi/data-journals
w
Dataset of books in the Essential resources for social research series
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books in the Essential resources for social research series [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=j0-book_series&fop0=%3D&fval0=Essential+resources+for+social+research&j=1&j0=book_series
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book series is Essential resources for social research. It features 9 columns including author, publication date, language, and book publisher.
Data from: The assessment of science: the relative merits of...
zenodo.org
data.niaid.nih.gov
+1more
Updated May 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Eyre-Walker; Nina Stoletzki; Adam Eyre-Walker; Nina Stoletzki (2022). Data from: The assessment of science: the relative merits of post-publication review, the impact factor and the number of citations [Dataset]. http://doi.org/10.5061/dryad.2h4j5
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.2h4j5
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adam Eyre-Walker; Nina Stoletzki; Adam Eyre-Walker; Nina Stoletzki
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Background: The assessment of scientific publications is an integral part of the scientific process. Here we investigate three methods of assessing the merit of a scientific paper: subjective post-publication peer review, the number of citations gained by a paper and the impact factor of the journal in which the article was published. Methodology/principle findings: We investigate these methods using two datasets in which subjective post-publication assessments of scientific publications have been made by experts. We find that there are moderate, but statistically significant, correlations between assessor scores, when two assessors have rated the same paper, and between assessor score and the number of citations a paper accrues. However, we show that assessor score depends strongly on the journal in which the paper is published, and that assessors tend to over-rate papers published in journals with high impact factors. If we control for this bias, we find that the correlation between assessor scores and between assessor score and the number of citations is weak, suggesting that scientists have little ability to judge either the intrinsic merit of a paper or its likely impact. We also show that the number of citations a paper receives is an extremely error-prone measure of scientific merit. Finally, we argue that the impact factor is likely to be a poor measure of merit, since it depends on subjective assessment. Conclusions: We conclude that the three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased and expensive method by which to assess merit. We argue that the impact factor may be the most satisfactory of the methods we have considered, since it is a form of pre-publication review. However, we emphasise that it is likely to be a very error-prone measure of merit that is qualitative, not quantitative.
Z
COKI Open Access Dataset
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Apr 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neylon, Cameron (2025). COKI Open Access Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6399462
Explore at:
Dataset updated
Apr 7, 2025
Dataset provided by
Montgomery, Lucy
Roelofs, Aniek
Neylon, Cameron
Hosking, Richard
Chien, Tuan-Yow
Diprose, James P.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Open Access Dataset measures open access performance for 225 countries and 50,000 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.

The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.

Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, OpenAlex, Unpaywall and the Research Organization Registry.

After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.

See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.

Code

The COKI Academic Observatory data collection pipeline is used to create the dataset.

The COKI OA Website Github project contains the code for the web app that visualises the dataset at open.coki.ac. It can be found on Zenodo here.

LicenseCOKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

AttributionsThis work contains information from:

OpenAlex which is made available under the CC0 license.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

Unpaywall. The Unpaywall Data Feed is used under license. Data is freely available from Unpaywall via the API, data dumps and as a data feed.

Research Organization Registry which is made available under a CC0 licence.
f
Data from: Scientific production about Open Educational Resources
scielo.figshare.com
png
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jimena de Mello HEREDIA; Rosângela Schwarz RODRIGUES; Eleonora Milano Falcão VIEIRA (2023). Scientific production about Open Educational Resources [Dataset]. http://doi.org/10.6084/m9.figshare.5885641.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5885641.v1
Dataset updated
Jun 1, 2023
Dataset provided by
SciELO journals
Authors
Jimena de Mello HEREDIA; Rosângela Schwarz RODRIGUES; Eleonora Milano Falcão VIEIRA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The present research identifies articles published in journals indexed in the Web of Science to characterize the scientific production on Open Educational Resources, in the higher education area. Descriptive and exploratory methodology of a quantitative and qualitative approach used mixed methods, constituting the research corpus by a survey strategy whose data was analyzed descriptively and through content analysis technique. As a result, it was possible to identify 115 articles of 243 researchers, published in 43 journals between 2008 and 2014. It was found that 67% of the journals with Open Educational Resources publications are of paid access, concentrating 56% of the articles in a restricted access. Institutions in the United Kingdom, Spain and Canada with researchers who have published on Open Educational Resources are all specialized in Distance Education. There was a predominance of authors working in the area of Education (48%), Computing (22%) and Engineering (11%) in comparison to other areas. In the qualitative stage, six articles were discarded so that the content analysis focused on 99 articles in English, eight in Spanish and two in Portuguese, totaling 109 articles analyzed in full. The articles were divided into seven categories: 21% of recovery and repositories, 19% of challenges, 16% of technologies, 14% of production, 13% on incentive policies and sustainability, 10% of adaptation and reuse and 4% on open courseware. It is possible to conclude that publications core focuses on a Canadian journal and 26 journals about education.
d
SciCrunch
dknet.org
Updated Oct 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). SciCrunch [Dataset]. http://identifiers.org/RRID:SCR_003115
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_003115 https://identifiers.org/RRID:SCR_003115/resolver/mentions
Dataset updated
Oct 18, 2019
Description
Community portal for researchers and content management system for data and databases. Intended to provide common source of data to research community and data about Research Resource Identifiers (RRIDs), which can be used in scientific publications. Central service where RRIDs can be searched and created. Designed to help communities of researchers create their own portals to provide access to resources, databases and tools of relevance to their research areas. Adds value to existing scientific resources by increasing their discoverability, accessibility, visibility, utility and interoperability, regardless of their current design or capabilities and without need for extensive redesign of their components or information models. Resources can be searched and discovered at multiple levels of integration, from superficial discovery based on limited description of resource at SciCrunch Registry, to deep content query at SciCrunch Data Federation.
f
Research applications of primary biodiversity databases in the digital age
plos.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan E. Ball-Damerow; Laura Brenskelle; Narayani Barve; Pamela S. Soltis; Petra Sierwald; Rüdiger Bieler; Raphael LaFrance; Arturo H. Ariño; Robert P. Guralnick (2023). Research applications of primary biodiversity databases in the digital age [Dataset]. http://doi.org/10.1371/journal.pone.0215794
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0215794
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Joan E. Ball-Damerow; Laura Brenskelle; Narayani Barve; Pamela S. Soltis; Petra Sierwald; Rüdiger Bieler; Raphael LaFrance; Arturo H. Ariño; Robert P. Guralnick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our world is in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing environment.
Forecast: Share of Corresponding/Leading Author in Scientific Publications...
reportlinker.com
Updated Apr 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReportLinker (2024). Forecast: Share of Corresponding/Leading Author in Scientific Publications in Organizational Behavior and Human Resource Management in the US 2024 - 2028 [Dataset]. https://www.reportlinker.com/dataset/a657b9f74ae080c85e8ad55de585156c58ea0322
Explore at:
Dataset updated
Apr 7, 2024
Dataset authored and provided by
ReportLinker
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
Forecast: Share of Corresponding/Leading Author in Scientific Publications in Organizational Behavior and Human Resource Management in the US 2024 - 2028 Discover more data with ReportLinker!
Forecast: Share of Scientific Publications Among the World's 10% Top-Cited...
reportlinker.com
Updated Apr 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ReportLinker (2024). Forecast: Share of Scientific Publications Among the World's 10% Top-Cited Publications in Organizational Behavior and Human Resource Management in the US 2024 - 2028 [Dataset]. https://www.reportlinker.com/dataset/54660357d85849e010a2d8b634f3023aa16aa247
Explore at:
Dataset updated
Apr 8, 2024
Dataset authored and provided by
ReportLinker
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
United States
Description
Forecast: Share of Scientific Publications Among the World's 10% Top-Cited Publications in Organizational Behavior and Human Resource Management in the US 2024 - 2028 Discover more data with ReportLinker!
d
Data from: Sharing detailed research data is associated with increased...
dataone.org
data.niaid.nih.gov
+1more
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather A. Piwowar; Roger S. Day; Douglas B. Fridsma (2025). Sharing detailed research data is associated with increased citation rate [Dataset]. http://doi.org/10.5061/dryad.j2c4g
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.j2c4g
Dataset updated
Apr 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Heather A. Piwowar; Roger S. Day; Douglas B. Fridsma
Time period covered
Jan 1, 2011
Description
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
Academic Research Databases Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Academic Research Databases Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/academic-research-databases-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Academic Research Databases Market Outlook

The global academic research databases market size was valued at approximately USD 3.5 billion in 2023 and is projected to reach around USD 6.2 billion by 2032, growing at a CAGR of 6.5% during the forecast period. The increasing demand for digital resources in academic and research institutions, along with the growing emphasis on online learning and resource accessibility, are key factors driving market growth.

One significant growth factor for the academic research databases market is the exponential increase in academic research activity worldwide. With the surge in the number of higher education institutions and research facilities, the demand for comprehensive and easily accessible databases has skyrocketed. These databases provide a centralized platform for researchers to access a wide array of scholarly articles, data sets, and other pertinent information, streamlining the research process and enhancing the quality of scholarly work.

Another driving force behind the market's expansion is the continuous technological advancements in database management and search functionalities. Modern academic research databases are equipped with sophisticated search algorithms, artificial intelligence, and machine learning capabilities that enable users to efficiently locate relevant information. These advancements not only improve user experience but also significantly reduce the time and effort required to conduct comprehensive literature reviews and gather data.

The increasing prevalence of interdisciplinary research is also contributing to the growth of the academic research databases market. Researchers today often work at the intersection of multiple disciplines, necessitating access to a diverse range of subject-specific databases. The availability of comprehensive databases that cover various fields such as science, technology, medicine, social sciences, and humanities supports this trend by providing researchers with the resources they need to explore and integrate knowledge from different domains.

From a regional perspective, North America holds the largest share of the academic research databases market, driven by the high concentration of leading academic and research institutions and substantial investments in research and development. Europe follows closely, with significant contributions from countries like the UK, Germany, and France. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by the rapid expansion of higher education infrastructure and increasing government support for research activities. Latin America and the Middle East & Africa, though smaller in market size, are also projected to experience steady growth due to rising academic and research initiatives in these regions.

Database Type Analysis

The academic research databases market is segmented by database type into bibliographic, full-text, numeric, multimedia, and others. Bibliographic databases, which include indexes and abstracts of research articles, play a crucial role in helping researchers locate relevant literature. These databases have been foundational in academic research, providing essential references and citation tracking that are pivotal for scholarly work. Their significance remains high due to the increasing volume of academic publications and the need for comprehensive literature searches.

Full-text databases provide complete access to research articles, journals, and other scholarly materials, making them indispensable for researchers who require in-depth study materials. The convenience of accessing entire articles, rather than just abstracts or summaries, significantly enhances the research process. Full-text databases are particularly valuable in fields such as medicine, where access to full clinical study reports, reviews, and case studies is critical for evidence-based practice.

Numeric databases, which offer access to statistical and numerical data, are essential for researchers in fields like economics, social sciences, and the natural sciences. These databases provide valuable data sets that can be used for quantitative analysis, modeling, and empirical research. The increasing emphasis on data-driven research and the availability of large data sets are propelling the demand for numeric databases.

Multimedia databases, which include audio, video, and other multimedia content, are gaining traction in academic research. These databases are particularly useful in disciplines such a

Facebook

Twitter

Click to copy link

Link copied

Cite

Caifan Du (2021). Softcite Dataset: A dataset of software mentions in research publications [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4444074

Softcite Dataset: A dataset of software mentions in research publications

Explore at:

Dataset updated

Jan 17, 2021

Dataset provided by

Caifan Du
James Howison
Hannah Cohoon
Patrice Lopez

Description

The Softcite dataset is a gold-standard dataset of software mentions in research publications, a free resource primarily for software entity recognition in scholarly text. This is the first release of this dataset.

What's in the dataset

With the aim of facilitating software entity recognition efforts at scale and eventually increased visibility of research software for the due credit of software contributions to scholarly research, a team of trained annotators from Howison Lab at the University of Texas at Austin annotated 4,093 software mentions in 4,971 open access research publications in biomedicine (from PubMed Central Open Access collection) and economics (from Unpaywall open access services). The annotated software mentions, along with their publisher, version, and access URL, if mentioned in the text, as well as those publications annotated as containing no software mentions, are all included in the released dataset as a TEI/XML corpus file.

For understanding the schema of the Softcite corpus, its design considerations, and provenance, please refer to our paper included in this release (preprint version).

Use scenarios

The release of the Softcite dataset is intended to encourage researchers and stakeholders to make research software more visible in science, especially to academic databases and systems of information retrieval; and facilitate interoperability and collaboration among similar and relevant efforts in software entity recognition and building utilities for software information retrieval. This dataset can also be useful for researchers investigating software use in academic research.

Current release content

softcite-dataset v1.0 release includes:

The Softcite dataset corpus file: softcite_corpus-full.tei.xml

Softcite Dataset: A Dataset of Software Mentions in Biomedical and Economic Research Publications, our paper that describes the design consideration and creation process of the dataset: Softcite_Dataset_Description_RC.pdf. (This is a preprint version of our forthcoming publication in the Journal of the Association for Information Science and Technology.)

The Softcite dataset is licensed under a Creative Commons Attribution 4.0 International License.

If you have questions, please start a discussion or issue in the howisonlab/softcite-dataset Github repository.

Clear search

Close search

Google apps

Main menu

Softcite Dataset: A dataset of software mentions in research publications

Catalogue of natural resource scientific and technical publications

Early Indicator for Data Sharing and Reuse - Supplementary Tables.xlsx

Data from: A comprehensive dataset of the Spanish research output and its...

Data from: Inventory of online public databases and repositories holding...

Dataset of books about Research-Computer network resources

Data from: Linking full-text grey literature to underlying research and...

Data from: Where do engineering students really get their information? :...

Disambiguated researchers publication data

Data from: List of data journals

Dataset of books in the Essential resources for social research series

Data from: The assessment of science: the relative merits of...

COKI Open Access Dataset

Data from: Scientific production about Open Educational Resources

SciCrunch

Research applications of primary biodiversity databases in the digital age

Forecast: Share of Corresponding/Leading Author in Scientific Publications...

Forecast: Share of Scientific Publications Among the World's 10% Top-Cited...

Data from: Sharing detailed research data is associated with increased...

Academic Research Databases Market Report | Global Forecast From 2025 To...

Academic Research Databases Market Outlook

Database Type Analysis

Softcite Dataset: A dataset of software mentions in research publicationsSee More Versions

Softcite Dataset: A dataset of software mentions in research publications