100+ datasets found

P
Pubmed Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad, Pubmed Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed
Explore at:
Authors
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad
Description
The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
d
PubMed total records by publication year
catalog.data.gov
healthdata.gov
+2more
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). PubMed total records by publication year [Dataset]. https://catalog.data.gov/dataset/pubmed-total-records-by-publication-year-fcf4a
Explore at:
Dataset updated
Feb 3, 2025
Dataset provided by
National Library of Medicine
Description
Yearly citation totals from each year of the MEDLINE/PubMed Baseline referencing citations back to year 1781. These totals may increase over time for a particular year as new citations are added. For example, 25 citations were listed for the year 1800 in the 2018 MEDLINE/PubMed Baseline, while the 2019 Baseline includes 387 citations for that year.
d
MEDLINE/PubMed Citations
catalog.data.gov
healthdata.gov
+2more
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). MEDLINE/PubMed Citations [Dataset]. https://catalog.data.gov/dataset/medline-pubmed-citations-d2ed0
Explore at:
Dataset updated
Feb 3, 2025
Dataset provided by
National Library of Medicine
Description
PubMed is a free resource supporting the search and retrieval of biomedical and life sciences literature with the aim of improving health–both globally and personally. The PubMed database contains citations and abstracts of biomedical literature. It does not include full text journal articles; however, links to the full text are often present when available from other sources, such as the publisher's website or PubMed Central (PMC). See the PubMed User Guide for more information. https://pubmed.ncbi.nlm.nih.gov/help/
I
Hype - PubMed dataset
databank.illinois.edu
Updated Jan 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Apratim Mishra; Jana Diesner; Vetle I. Torvik (2025). Hype - PubMed dataset [Dataset]. http://doi.org/10.13012/B2IDB-0651259_V2
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-0651259_V2
Dataset updated
Jan 31, 2025
Authors
Apratim Mishra; Jana Diesner; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hype - PubMed dataset Prepared by Apratim Mishra This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences. The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’. This is version 2 of the dataset. Changes include: Added “Year” variable. Removed “Abstract length” variable. Modified variable information due to updated probabilistic model of hype. Number of hype words - 35 (updated from 36 based on revised findings). File 1: hype_dataset_final.tsv Primary dataset. It has the following columns: 1. PMID: represents unique article ID in PubMed 2. Year: Year of publication 3. Hype_word: Candidate hype word, such as ‘novel.’ 4. Sentence: Sentence in abstract containing the hype word. 5. Hype_percentile: Abstract relative position of hype word. 6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location. 7. Introduction: The ‘I’ component of the hype word based on IMRaD 8. Methods: The ‘M’ component of the hype word based on IMRaD 9. Results: The ‘R’ component of the hype word based on IMRaD 10. Discussion: The ‘D’ component of the hype word based on IMRaD File 2: hype_removed_phrases_final.tsv Secondary dataset with same columns as File 1. Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases: 1. Major: histocompatibility, component, protein, metabolite, complex, surgery 2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid 3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment 4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values 5. Essential: medium, features, properties, opportunities, oil 6. Unique: model, amino 7. Robust: regression 8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information 9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains 10. Remarkable: properties 11. Definite: radiotherapy, surgery
PubMed
healthdata.gov
datadiscovery.nlm.nih.gov
+4more
application/rdfxml +5
Updated Mar 31, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datadiscovery.nlm.nih.gov (2021). PubMed [Dataset]. https://healthdata.gov/dataset/PubMed/h5mw-dwr6
Explore at:
csv, application/rdfxml, json, application/rssxml, xml, tsvAvailable download formats
Dataset updated
Mar 31, 2021
Dataset provided by
datadiscovery.nlm.nih.gov
Description
PubMed comprises more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
d
PubMed Central Open Access Subset (PMC OA)
catalog.data.gov
healthdata.gov
+2more
Updated Feb 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2025). PubMed Central Open Access Subset (PMC OA) [Dataset]. https://catalog.data.gov/dataset/pubmed-central-open-access-subset-pmc-oa
Explore at:
Dataset updated
Feb 3, 2025
Dataset provided by
National Library of Medicine
Description
Not all articles in PMC are available for text mining and other reuse, many have copyright protection, however articles in the PMC Open Access Subset are made available for download under a Creative Commons or similar license that generally allows more liberal redistribution and reuse than a traditional copyrighted work.
h
pubmed
huggingface.co
Updated Aug 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huihao JING (2024). pubmed [Dataset]. https://huggingface.co/datasets/AnonymousNodeGAE/pubmed
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2024
Authors
Huihao JING
Description
AnonymousNodeGAE/pubmed dataset hosted on Hugging Face and contributed by the HF Datasets community
P
PubMed Term, Abstract, Conclusion, Title Dataset Dataset
paperswithcode.com
Updated May 19, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qingyun Wang; Lifu Huang; Zhiying Jiang; Kevin Knight; Heng Ji; Mohit Bansal; Yi Luan (2019). PubMed Term, Abstract, Conclusion, Title Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed-term-abstract-conclusion-title-dataset
Explore at:
Dataset updated
May 19, 2019
Authors
Qingyun Wang; Lifu Huang; Zhiying Jiang; Kevin Knight; Heng Ji; Mohit Bansal; Yi Luan
Description
This dataset gathers three types of pairs: Title-to-Abstract (Training: 22,811/Development: 2095/Test: 2095), Abstract-to-Conclusion and Future work (Training: 22,811/Development: 2095/Test: 2095), Conclusion and Future work-to-Title (Training: 15,902/Development: 2095/Test: 2095) from PubMed. Each pair contains a pair of input and output as well as the corresponding terms(from original KB and link prediction results).
N
MEDLINE/PubMed Baseline Repository (MBR)
datadiscovery.nlm.nih.gov
data.virginia.gov
+2more
application/rdfxml +5
Updated Jun 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). MEDLINE/PubMed Baseline Repository (MBR) [Dataset]. https://datadiscovery.nlm.nih.gov/Terminology/MEDLINE-PubMed-Baseline-Repository-MBR-/exav-tdkk
Explore at:
xml, csv, application/rdfxml, json, application/rssxml, tsvAvailable download formats
Dataset updated
Jun 30, 2021
Description
The MEDLINE/PubMed Baseline Repository (MBR) provides access to each MEDLINE/PubMed Baseline snapshot starting with the 2002 MEDLINE Baseline. Each baseline contains a snapshot of MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The baseline snapshot is created at the beginning of each new MeSH Indexing Year. The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.
h
pubmed-abstracts-noised-with-kaggle-dist
huggingface.co
Updated Feb 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pubmed-abstracts-noised-with-kaggle-dist [Dataset]. https://huggingface.co/datasets/gayanin/pubmed-abstracts-noised-with-kaggle-dist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 23, 2024
Authors
Gayani Nanayakkara
Description
gayanin/pubmed-abstracts-noised-with-kaggle-dist dataset hosted on Hugging Face and contributed by the HF Datasets community
h
nbs-pubmed-qa
huggingface.co
Updated Mar 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ded (2025). nbs-pubmed-qa [Dataset]. https://huggingface.co/datasets/deddyext/nbs-pubmed-qa
Explore at:
Dataset updated
Mar 21, 2025
Authors
Ded
Description
deddyext/nbs-pubmed-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
pubmed-abstract
huggingface.co
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uiyun Kim (2025). pubmed-abstract [Dataset]. https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract
Explore at:
Dataset updated
Mar 27, 2025
Authors
Uiyun Kim
Description
Dataset Summary

A daily-updated dataset of PubMed abstracts, collected via PubMed’s API and published on Hugging Face Datasets.Each snapshot is versioned by date (e.g., 2025-03-27) so users can track historical changes or use a consistent snapshot for reproducibility.

Updated daily Each version tagged by date Abstract-only dataset (no full text)

Dataset Structure

Column Type Description

pmid string Unique PubMed identifier

abstract string Abstract text… See the full description on the dataset page: https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract.
V
MEDLINE/PubMed Baseline Statistics: Min/Max Report
data.virginia.gov
datadiscovery.nlm.nih.gov
+2more
csv, json, rdf, xsl
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Library of Medicine (2024). MEDLINE/PubMed Baseline Statistics: Min/Max Report [Dataset]. https://data.virginia.gov/dataset/medline-pubmed-baseline-statistics-min-max-report
Explore at:
rdf, json, csv, xslAvailable download formats
Dataset updated
Sep 6, 2024
Dataset provided by
National Library of Medicine
Description
A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records.

MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.
b
PubMed
bioregistry.io
Updated Dec 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). PubMed [Dataset]. http://identifiers.org/wikidata:P698
Explore at:
Unique identifier
https://identifiers.org/wikidata:P698
Dataset updated
Dec 28, 2021
Description
PubMed is a service of the U.S. National Library of Medicine that includes citations from MEDLINE and other life science journals for biomedical articles back to the 1950s.
PKG21
figshare.com
zip
Updated Nov 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jian Xu (2021). PKG21 [Dataset]. http://doi.org/10.6084/m9.figshare.17072960.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17072960.v1
Dataset updated
Nov 25, 2021
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jian Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PKG21 CSV files.
h
pubmed-10k
huggingface.co
Updated Jul 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronit Mandal (2023). pubmed-10k [Dataset]. https://huggingface.co/datasets/ronitHF/pubmed-10k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2023
Authors
Ronit Mandal
Description
Dataset Summary

First 10k rows of the scientific_papers["pubmed"] dataset. 10:1:1 split.

Usage

from datasets import load_dataset

train_dataset = load_dataset("ronitHF/pubmed-10k", split="train") val_dataset = load_dataset("ronitHF/pubmed-10k", split="validation") test_dataset = load_dataset("ronitHF/pubmed-10k", split="test")
MEDLINE/PubMed Baseline Statistics: Misc Report
healthdata.gov
data.virginia.gov
+2more
application/rdfxml +5
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
datadiscovery.nlm.nih.gov (2023). MEDLINE/PubMed Baseline Statistics: Misc Report [Dataset]. https://healthdata.gov/w/8v7d-ywxm/default?cur=eSI7VgiHdpl
Explore at:
json, csv, application/rssxml, application/rdfxml, xml, tsvAvailable download formats
Dataset updated
Mar 24, 2023
Dataset provided by
datadiscovery.nlm.nih.gov
Description
A file containing all Misc Baseline Reports for 2018-2023 in their original format is available in the Attachments section below.

MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.
Pubmed Journal Recommendation System dataset
zenodo.org
data.niaid.nih.gov
csv
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiayun Liu; Manuel Castillo Cara; Manuel Castillo Cara; Raúl García Castro; Raúl García Castro; Jiayun Liu (2023). Pubmed Journal Recommendation System dataset [Dataset]. http://doi.org/10.5281/zenodo.8386011
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8386011
Dataset updated
Dec 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jiayun Liu; Manuel Castillo Cara; Manuel Castillo Cara; Raúl García Castro; Raúl García Castro; Jiayun Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset for Journal recommendation, includes title, abstract, keywords, and journal.

We extracted the journals and more information of:

Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.

Dataset Components:

data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.

data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.

data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.
PostgreSQL query to select the ten journals with the highest number of...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PostgreSQL query to select the ten journals with the highest number of publications containing the MeSH term “Leukemia” [20] on the complete PubMed data set. [Dataset]. https://plos.figshare.com/articles/dataset/PostgreSQL_query_to_select_the_ten_journals_with_the_highest_number_of_publications_containing_the_MeSH_term_Leukemia_20_on_the_complete_PubMed_data_set_/3993120
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0163794.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Kersten Döring; Björn A. Grüning; Kiran K. Telukunta; Philippe Thomas; Stefan Günther
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PostgreSQL query to select the ten journals with the highest number of publications containing the MeSH term “Leukemia” [20] on the complete PubMed data set.
f
JoinPoint regression analysis of different APC trends.
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bui The Hung; Nguyen Phuoc Long; Le Phi Hung; Nguyen Thien Luan; Nguyen Hoang Anh; Tran Diem Nghi; Mai Van Hieu; Nguyen Thi Huyen Trang; Herizo Fabien Rafidinarivo; Nguyen Ky Anh; David Hawkes; Nguyen Tien Huy; Kenji Hirayama (2023). JoinPoint regression analysis of different APC trends. [Dataset]. http://doi.org/10.1371/journal.pone.0121054.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0121054.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Bui The Hung; Nguyen Phuoc Long; Le Phi Hung; Nguyen Thien Luan; Nguyen Hoang Anh; Tran Diem Nghi; Mai Van Hieu; Nguyen Thi Huyen Trang; Herizo Fabien Rafidinarivo; Nguyen Ky Anh; David Hawkes; Nguyen Tien Huy; Kenji Hirayama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
jAPC = Annual percent changes calculated by JoinpointRegression Analysis*APC is significantly different from zero when P < 0.05JoinPoint regression analysis of different APC trends.

Facebook

Twitter

Click to copy link

Link copied

Cite

Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad, Pubmed Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed

Pubmed Dataset

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Authors

Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad

Description

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.

Clear search

Close search

Google apps

Main menu

Pubmed Dataset

PubMed total records by publication year

MEDLINE/PubMed Citations

Hype - PubMed dataset

PubMed

PubMed Central Open Access Subset (PMC OA)

pubmed

PubMed Term, Abstract, Conclusion, Title Dataset Dataset

MEDLINE/PubMed Baseline Repository (MBR)

pubmed-abstracts-noised-with-kaggle-dist

nbs-pubmed-qa

pubmed-abstract

MEDLINE/PubMed Baseline Statistics: Min/Max Report

PubMed

PKG21

pubmed-10k

MEDLINE/PubMed Baseline Statistics: Misc Report

Pubmed Journal Recommendation System dataset

PostgreSQL query to select the ten journals with the highest number of...

JoinPoint regression analysis of different APC trends.

Pubmed DatasetSee More Versions

Pubmed Dataset