100+ datasets found
  1. P

    Pubmed Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad, Pubmed Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed
    Explore at:
    Authors
    Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad
    Description

    The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.

  2. d

    PubMed total records by publication year

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). PubMed total records by publication year [Dataset]. https://catalog.data.gov/dataset/pubmed-total-records-by-publication-year-fcf4a
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    National Library of Medicine
    Description

    Yearly citation totals from each year of the MEDLINE/PubMed Baseline referencing citations back to year 1781. These totals may increase over time for a particular year as new citations are added. For example, 25 citations were listed for the year 1800 in the 2018 MEDLINE/PubMed Baseline, while the 2019 Baseline includes 387 citations for that year.

  3. d

    MEDLINE/PubMed Citations

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). MEDLINE/PubMed Citations [Dataset]. https://catalog.data.gov/dataset/medline-pubmed-citations-d2ed0
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    National Library of Medicine
    Description

    PubMed is a free resource supporting the search and retrieval of biomedical and life sciences literature with the aim of improving health–both globally and personally. The PubMed database contains citations and abstracts of biomedical literature. It does not include full text journal articles; however, links to the full text are often present when available from other sources, such as the publisher's website or PubMed Central (PMC). See the PubMed User Guide for more information. https://pubmed.ncbi.nlm.nih.gov/help/

  4. I

    Hype - PubMed dataset

    • databank.illinois.edu
    Updated Jan 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Apratim Mishra; Jana Diesner; Vetle I. Torvik (2025). Hype - PubMed dataset [Dataset]. http://doi.org/10.13012/B2IDB-0651259_V2
    Explore at:
    Dataset updated
    Jan 31, 2025
    Authors
    Apratim Mishra; Jana Diesner; Vetle I. Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hype - PubMed dataset Prepared by Apratim Mishra This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences. The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’. This is version 2 of the dataset. Changes include: Added “Year” variable. Removed “Abstract length” variable. Modified variable information due to updated probabilistic model of hype. Number of hype words - 35 (updated from 36 based on revised findings). File 1: hype_dataset_final.tsv Primary dataset. It has the following columns: 1. PMID: represents unique article ID in PubMed 2. Year: Year of publication 3. Hype_word: Candidate hype word, such as ‘novel.’ 4. Sentence: Sentence in abstract containing the hype word. 5. Hype_percentile: Abstract relative position of hype word. 6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location. 7. Introduction: The ‘I’ component of the hype word based on IMRaD 8. Methods: The ‘M’ component of the hype word based on IMRaD 9. Results: The ‘R’ component of the hype word based on IMRaD 10. Discussion: The ‘D’ component of the hype word based on IMRaD File 2: hype_removed_phrases_final.tsv Secondary dataset with same columns as File 1. Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases: 1. Major: histocompatibility, component, protein, metabolite, complex, surgery 2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid 3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment 4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values 5. Essential: medium, features, properties, opportunities, oil 6. Unique: model, amino 7. Robust: regression 8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information 9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains 10. Remarkable: properties 11. Definite: radiotherapy, surgery

  5. PubMed

    • healthdata.gov
    • datadiscovery.nlm.nih.gov
    • +4more
    application/rdfxml +5
    Updated Mar 31, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2021). PubMed [Dataset]. https://healthdata.gov/dataset/PubMed/h5mw-dwr6
    Explore at:
    csv, application/rdfxml, json, application/rssxml, xml, tsvAvailable download formats
    Dataset updated
    Mar 31, 2021
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    PubMed comprises more than 26 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.

  6. d

    PubMed Central Open Access Subset (PMC OA)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2025). PubMed Central Open Access Subset (PMC OA) [Dataset]. https://catalog.data.gov/dataset/pubmed-central-open-access-subset-pmc-oa
    Explore at:
    Dataset updated
    Feb 3, 2025
    Dataset provided by
    National Library of Medicine
    Description

    Not all articles in PMC are available for text mining and other reuse, many have copyright protection, however articles in the PMC Open Access Subset are made available for download under a Creative Commons or similar license that generally allows more liberal redistribution and reuse than a traditional copyrighted work.

  7. h

    pubmed

    • huggingface.co
    Updated Aug 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huihao JING (2024). pubmed [Dataset]. https://huggingface.co/datasets/AnonymousNodeGAE/pubmed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Authors
    Huihao JING
    Description

    AnonymousNodeGAE/pubmed dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. P

    PubMed Term, Abstract, Conclusion, Title Dataset Dataset

    • paperswithcode.com
    Updated May 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qingyun Wang; Lifu Huang; Zhiying Jiang; Kevin Knight; Heng Ji; Mohit Bansal; Yi Luan (2019). PubMed Term, Abstract, Conclusion, Title Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed-term-abstract-conclusion-title-dataset
    Explore at:
    Dataset updated
    May 19, 2019
    Authors
    Qingyun Wang; Lifu Huang; Zhiying Jiang; Kevin Knight; Heng Ji; Mohit Bansal; Yi Luan
    Description

    This dataset gathers three types of pairs: Title-to-Abstract (Training: 22,811/Development: 2095/Test: 2095), Abstract-to-Conclusion and Future work (Training: 22,811/Development: 2095/Test: 2095), Conclusion and Future work-to-Title (Training: 15,902/Development: 2095/Test: 2095) from PubMed. Each pair contains a pair of input and output as well as the corresponding terms(from original KB and link prediction results).

  9. N

    MEDLINE/PubMed Baseline Repository (MBR)

    • datadiscovery.nlm.nih.gov
    • data.virginia.gov
    • +2more
    application/rdfxml +5
    Updated Jun 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). MEDLINE/PubMed Baseline Repository (MBR) [Dataset]. https://datadiscovery.nlm.nih.gov/Terminology/MEDLINE-PubMed-Baseline-Repository-MBR-/exav-tdkk
    Explore at:
    xml, csv, application/rdfxml, json, application/rssxml, tsvAvailable download formats
    Dataset updated
    Jun 30, 2021
    Description

    The MEDLINE/PubMed Baseline Repository (MBR) provides access to each MEDLINE/PubMed Baseline snapshot starting with the 2002 MEDLINE Baseline. Each baseline contains a snapshot of MEDLINE citations in the state they were at a given moment in time without the MeSH vocabulary updates and other revisions that occur during the year. The baseline snapshot is created at the beginning of each new MeSH Indexing Year. The records included in the MEDLINE/PubMed Baseline databases represent a static view of the data at the time each baseline database was created.

  10. h

    pubmed-abstracts-noised-with-kaggle-dist

    • huggingface.co
    Updated Feb 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    pubmed-abstracts-noised-with-kaggle-dist [Dataset]. https://huggingface.co/datasets/gayanin/pubmed-abstracts-noised-with-kaggle-dist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2024
    Authors
    Gayani Nanayakkara
    Description

    gayanin/pubmed-abstracts-noised-with-kaggle-dist dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    nbs-pubmed-qa

    • huggingface.co
    Updated Mar 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ded (2025). nbs-pubmed-qa [Dataset]. https://huggingface.co/datasets/deddyext/nbs-pubmed-qa
    Explore at:
    Dataset updated
    Mar 21, 2025
    Authors
    Ded
    Description

    deddyext/nbs-pubmed-qa dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    pubmed-abstract

    • huggingface.co
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Uiyun Kim (2025). pubmed-abstract [Dataset]. https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract
    Explore at:
    Dataset updated
    Mar 27, 2025
    Authors
    Uiyun Kim
    Description

    Dataset Summary

    A daily-updated dataset of PubMed abstracts, collected via PubMed’s API and published on Hugging Face Datasets.Each snapshot is versioned by date (e.g., 2025-03-27) so users can track historical changes or use a consistent snapshot for reproducibility.

    Updated daily Each version tagged by date Abstract-only dataset (no full text)

      Dataset Structure
    

    Column Type Description

    pmid string Unique PubMed identifier

    abstract string Abstract text… See the full description on the dataset page: https://huggingface.co/datasets/uiyunkim-hub/pubmed-abstract.

  13. V

    MEDLINE/PubMed Baseline Statistics: Min/Max Report

    • data.virginia.gov
    • datadiscovery.nlm.nih.gov
    • +2more
    csv, json, rdf, xsl
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Library of Medicine (2024). MEDLINE/PubMed Baseline Statistics: Min/Max Report [Dataset]. https://data.virginia.gov/dataset/medline-pubmed-baseline-statistics-min-max-report
    Explore at:
    rdf, json, csv, xslAvailable download formats
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    National Library of Medicine
    Description

    A file containing all Min/Max Baseline Reports for 2005-2023 in their original format is available in the Attachments section below. A second file includes a separate set of reports, made available from 2002-2017, that did not include OLDMEDLINE records.

    MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.

  14. b

    PubMed

    • bioregistry.io
    Updated Dec 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). PubMed [Dataset]. http://identifiers.org/wikidata:P698
    Explore at:
    Dataset updated
    Dec 28, 2021
    Description

    PubMed is a service of the U.S. National Library of Medicine that includes citations from MEDLINE and other life science journals for biomedical articles back to the 1950s.

  15. PKG21

    • figshare.com
    zip
    Updated Nov 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Xu (2021). PKG21 [Dataset]. http://doi.org/10.6084/m9.figshare.17072960.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 25, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jian Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PKG21 CSV files.

  16. h

    pubmed-10k

    • huggingface.co
    Updated Jul 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronit Mandal (2023). pubmed-10k [Dataset]. https://huggingface.co/datasets/ronitHF/pubmed-10k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2023
    Authors
    Ronit Mandal
    Description

    Dataset Summary

    First 10k rows of the scientific_papers["pubmed"] dataset. 10:1:1 split.

      Usage
    

    from datasets import load_dataset

    train_dataset = load_dataset("ronitHF/pubmed-10k", split="train") val_dataset = load_dataset("ronitHF/pubmed-10k", split="validation") test_dataset = load_dataset("ronitHF/pubmed-10k", split="test")

  17. MEDLINE/PubMed Baseline Statistics: Misc Report

    • healthdata.gov
    • data.virginia.gov
    • +2more
    application/rdfxml +5
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    datadiscovery.nlm.nih.gov (2023). MEDLINE/PubMed Baseline Statistics: Misc Report [Dataset]. https://healthdata.gov/w/8v7d-ywxm/default?cur=eSI7VgiHdpl
    Explore at:
    json, csv, application/rssxml, application/rdfxml, xml, tsvAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    datadiscovery.nlm.nih.gov
    Description

    A file containing all Misc Baseline Reports for 2018-2023 in their original format is available in the Attachments section below.

    MEDLINE/PubMed annual statistical reports are based upon the data elements in the baseline versions of MEDLINE®/PubMed are available. For each year covered the reports include: total citations containing each element; total occurrences of each element; minimum/average/maximum occurrences of each element in a record; minimum/average/maximum length of a single element occurrence; average record size; and other statistical data describing the content and size of the elements.

  18. Pubmed Journal Recommendation System dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiayun Liu; Manuel Castillo Cara; Manuel Castillo Cara; Raúl García Castro; Raúl García Castro; Jiayun Liu (2023). Pubmed Journal Recommendation System dataset [Dataset]. http://doi.org/10.5281/zenodo.8386011
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jiayun Liu; Manuel Castillo Cara; Manuel Castillo Cara; Raúl García Castro; Raúl García Castro; Jiayun Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for Journal recommendation, includes title, abstract, keywords, and journal.

    We extracted the journals and more information of:

    Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.

    Dataset Components:

    • data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.

    • data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.

    • data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.

  19. PostgreSQL query to select the ten journals with the highest number of...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PostgreSQL query to select the ten journals with the highest number of publications containing the MeSH term “Leukemia” [20] on the complete PubMed data set. [Dataset]. https://plos.figshare.com/articles/dataset/PostgreSQL_query_to_select_the_ten_journals_with_the_highest_number_of_publications_containing_the_MeSH_term_Leukemia_20_on_the_complete_PubMed_data_set_/3993120
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Kersten Döring; Björn A. Grüning; Kiran K. Telukunta; Philippe Thomas; Stefan Günther
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PostgreSQL query to select the ten journals with the highest number of publications containing the MeSH term “Leukemia” [20] on the complete PubMed data set.

  20. f

    JoinPoint regression analysis of different APC trends.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bui The Hung; Nguyen Phuoc Long; Le Phi Hung; Nguyen Thien Luan; Nguyen Hoang Anh; Tran Diem Nghi; Mai Van Hieu; Nguyen Thi Huyen Trang; Herizo Fabien Rafidinarivo; Nguyen Ky Anh; David Hawkes; Nguyen Tien Huy; Kenji Hirayama (2023). JoinPoint regression analysis of different APC trends. [Dataset]. http://doi.org/10.1371/journal.pone.0121054.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bui The Hung; Nguyen Phuoc Long; Le Phi Hung; Nguyen Thien Luan; Nguyen Hoang Anh; Tran Diem Nghi; Mai Van Hieu; Nguyen Thi Huyen Trang; Herizo Fabien Rafidinarivo; Nguyen Ky Anh; David Hawkes; Nguyen Tien Huy; Kenji Hirayama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    jAPC = Annual percent changes calculated by JoinpointRegression Analysis*APC is significantly different from zero when P < 0.05JoinPoint regression analysis of different APC trends.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad, Pubmed Dataset [Dataset]. https://paperswithcode.com/dataset/pubmed

Pubmed Dataset

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
Authors
Prithviraj Sen; Galileo Namata; Mustafa Bilgic; Lise Getoor; Brian Gallagher; Tina Eliassi-Rad
Description

The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.

Search
Clear search
Close search
Google apps
Main menu