72 datasets found
  1. h

    pubmed-pmc-sr-filtered

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shuai wang, pubmed-pmc-sr-filtered [Dataset]. https://huggingface.co/datasets/wshuai190/pubmed-pmc-sr-filtered
    Explore at:
    Authors
    shuai wang
    Description

    wshuai190/pubmed-pmc-sr-filtered

      Dataset Description
    

    This dataset contains medical literature data for training Boolean query generation models. The data includes PubMed articles with their associated metadata, references, and result section PMIDs.

      Dataset Structure
    
    
    
    
    
      Data Fields
    

    pmid: PubMed ID of the article pmc-id: PMC ID (if available) title: Article title max-date: Maximum publication date references-pmids: List of PMIDs referenced in the article… See the full description on the dataset page: https://huggingface.co/datasets/wshuai190/pubmed-pmc-sr-filtered.

  2. PubMed Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Nov 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). PubMed Datasets [Dataset]. https://brightdata.com/products/datasets/pubmed
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Nov 19, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock valuable biomedical knowledge with our comprehensive PubMed Dataset, designed for researchers, analysts, and healthcare professionals to track medical advancements, explore drug discoveries, and analyze scientific literature.

    Dataset Features

    Scientific Articles & Abstracts: Access structured data from PubMed, including article titles, abstracts, authors, publication dates, and journal sources. Medical Research & Clinical Studies: Retrieve data on clinical trials, drug research, disease studies, and healthcare innovations. Keywords & MeSH Terms: Extract key medical subject headings (MeSH) and keywords to categorize and analyze research topics. Publication & Citation Data: Track citation counts, journal impact factors, and author affiliations for academic and industry research.

    Customizable Subsets for Specific Needs Our PubMed Dataset is fully customizable, allowing you to filter data based on publication date, research category, keywords, or specific journals. Whether you need broad coverage for medical research or focused data for pharmaceutical analysis, we tailor the dataset to your needs.

    Popular Use Cases

    Pharmaceutical Research & Drug Development: Analyze clinical trial data, drug efficacy studies, and emerging treatments. Medical & Healthcare Intelligence: Track disease outbreaks, healthcare trends, and advancements in medical technology. AI & Machine Learning Applications: Use structured biomedical data to train AI models for predictive analytics, medical diagnosis, and literature summarization. Academic & Scientific Research: Access a vast collection of peer-reviewed studies for literature reviews, meta-analyses, and academic publishing. Regulatory & Compliance Monitoring: Stay updated on medical regulations, FDA approvals, and healthcare policy changes.

    Whether you're conducting medical research, analyzing healthcare trends, or developing AI-driven solutions, our PubMed Dataset provides the structured data you need. Get started today and customize your dataset to fit your research objectives.

  3. h

    pubmed-filtered

    • huggingface.co
    Updated Jan 10, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Dal Cin (2012). pubmed-filtered [Dataset]. https://huggingface.co/datasets/giuliadc/pubmed-filtered
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2012
    Authors
    Giulia Dal Cin
    Description

    Original data from: https://github.com/armancohan/long-summarization The first 3000 rows of the test split of the original dataset were processed and filtered as follows.

    In the original dataset, some sentences appear several times in the same article, even if they're only contained once in the original research paper. For this reason, all dataset rows where the same sentence appeared more than once where removed. In the original dataset, every sentence is a separate string, and these strings… See the full description on the dataset page: https://huggingface.co/datasets/giuliadc/pubmed-filtered.

  4. h

    dsir-pile-13m-filtered-for-pubmed-central

    • huggingface.co
    Updated Dec 31, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timaeus (2017). dsir-pile-13m-filtered-for-pubmed-central [Dataset]. https://huggingface.co/datasets/timaeus/dsir-pile-13m-filtered-for-pubmed-central
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2017
    Dataset authored and provided by
    Timaeus
    Description

    timaeus/dsir-pile-13m-filtered-for-pubmed-central dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Results from total and filtered searches in PubMed

    • zenodo.org
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Viet-Thi Tran; Viet-Thi Tran (2025). Results from total and filtered searches in PubMed [Dataset]. http://doi.org/10.5281/zenodo.16758566
    Explore at:
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Viet-Thi Tran; Viet-Thi Tran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using Large language Models to directly screen electronic databases as an alternative to traditional search strategies in systematic reviews: the example of the Cochrane Highly sensitive search

    The enclosed files correspond to 1) all studies published in MEDLINE between September 1st and September 30th 2024 using the sole keywords diabetes; and 2) studies published in MEDLINE between September 1st and September 30th 2024 using the keywords "diabetes" as well as the Cochrane High Sensitivity search.

    The code used to process the data is provided as a supplementary material in the publication

  6. Data from: PubMed's Core Clinical Journals Filter: Redesigned for...

    • figshare.com
    txt
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Klein-Fedyshin; Andrea M. Ketchum (2023). PubMed's Core Clinical Journals Filter: Redesigned for Contemporary Clinical Impact and Utility [Dataset]. http://doi.org/10.6084/m9.figshare.21979832.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Michele Klein-Fedyshin; Andrea M. Ketchum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Medical journal usage counts across 814 clinical locations in the U.S. and Canada from 2009 - 2015.

  7. f

    Data from: Searching for LINCS to Stress: Using Text Mining to Automate...

    • figshare.com
    xlsx
    Updated May 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryant A. Chambers; Danilo Basili; Laura Word; Nancy Baker; Alistair Middleton; Richard S. Judson; Imran Shah (2024). Searching for LINCS to Stress: Using Text Mining to Automate Reference Chemical Curation [Dataset]. http://doi.org/10.1021/acs.chemrestox.3c00335.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 13, 2024
    Dataset provided by
    ACS Publications
    Authors
    Bryant A. Chambers; Danilo Basili; Laura Word; Nancy Baker; Alistair Middleton; Richard S. Judson; Imran Shah
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Adaptive stress response pathways (SRPs) restore cellular homeostasis following perturbation but may activate terminal outcomes like apoptosis, autophagy, or cellular senescence if disruption exceeds critical thresholds. Because SRPs hold the key to vital cellular tipping points, they are targeted for therapeutic interventions and assessed as biomarkers of toxicity. Hence, we are developing a public database of chemicals that perturb SRPs to enable new data-driven tools to improve public health. Here, we report on the automated text-mining pipeline we used to build and curate the first version of this database. We started with 100 reference SRP chemicals gathered from published biomarker studies to bootstrap the database. Second, we used information retrieval to find co-occurrences of reference chemicals with SRP terms in PubMed abstracts and determined pairwise mutual information thresholds to filter biologically relevant relationships. Third, we applied these thresholds to find 1206 putative SRP perturbagens within thousands of substances in the Library of Integrated Network-Based Cellular Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain experts had to manually review at least three publications for each of 1206 chemicals out of 181,805 total abstracts. To accomplish this efficiently, we implemented a machine learning approach to predict SRP classifications from texts to prioritize abstracts. In 5-fold cross-validation testing with a corpus derived from the 100 reference chemicals, artificial neural networks performed the best (F1-macro = 0.678) and prioritized 2479/181,805 abstracts for expert review, which resulted in 457 chemicals annotated with SRP activities. An independent analysis of enriched mechanisms of action and chemical use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90 and DNA damage inducers to topoisomerase inhibition. This database will enable novel applications of LINCS data to evaluate SRP activities and to further develop tools for biomedical information extraction from the literature.

  8. STS Model of the PubMed Literature

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Boyack; Caleb Smith; Richard Klavans (2023). STS Model of the PubMed Literature [Dataset]. http://doi.org/10.6084/m9.figshare.12743639.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Kevin Boyack; Caleb Smith; Richard Klavans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PubMed model contains over 18 million PubMed documents (1996-2019) clustered into 28,743 clusters for use in research planning, portfolio analysis, systematic review, etc. This repository contains the PMID-to-cluster listing, an Excel workbook that characterizes each cluster with metadata and cluster-level indicators, and a Tableau workbook containing those same data plus a visual map and filters that can be used to explore the landscape and analyze cluster-level information. Model created by SciTech Strategies, Inc. Details can be found in the accompanying article published in Scientific Data at https://www.nature.com/articles/s41597-020-00749-y (or https://rdcu.be/ca4kv).

  9. R

    Uftir_curated Dataset

    • universe.roboflow.com
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    uFTIR Particles (2023). Uftir_curated Dataset [Dataset]. https://universe.roboflow.com/uftir-particles/uftir_curated/dataset/6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2023
    Dataset authored and provided by
    uFTIR Particles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Particle Polygons
    Description

    Micro-FTIR Filter Images for Particle Detection

    This dataset consists of annotated images of filters containing particles. The primary objective of this dataset is to serve as training and validation data for developing a particle detection model using computer vision techniques. More specifically, this dataset can be used to train an image segmentation model that can be used with GEPARD (https://pubmed.ncbi.nlm.nih.gov/32436395/) in order to perform efficient particle detection and analysis using Micro-FTIR microscope.

    Two kind of samples are used in our case:

    • Normal filters, with a low amount of particles and a clear view of the filter
    • Saturated filters, where the particles cover almost all the filter

    In the first case, particles were annotated easilly as they are clearly visible over the filter. In the second scenario, the most distinguishable particles on the image have been annotated.

    Note

    In the case of a saturated filters, the correct method would be to collect a spectral image of the entire filter using a FPA detector or similar and then use tools (e.g. sIMPle ) to analyse this image. However, in our scenario such detector was not available, and a semi-random / operator dependant method had to be used in order to select particles or points for scanning.

  10. t

    BIOGRID CURATED DATA FOR PUBLICATION: Human VPAC1 receptor selectivity...

    • thebiogrid.org
    zip
    Updated Oct 4, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2002). BIOGRID CURATED DATA FOR PUBLICATION: Human VPAC1 receptor selectivity filter. Identification of a critical domain for restricting secretin binding. [Dataset]. https://thebiogrid.org/8250/publication/human-vpac1-receptor-selectivity-filter-identification-of-a-critical-domain-for-restricting-secretin-binding.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 4, 2002
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Du K (2002):Human VPAC1 receptor selectivity filter. Identification of a critical domain for restricting secretin binding. curated by BioGRID (https://thebiogrid.org); ABSTRACT: The human VPAC1 receptor for vasoactive intestinal peptide (VIP) and pituitary adenylate cyclase activating peptide (PACAP) belongs to the class II family of G protein coupled receptors with seven transmembrane segments. It recognizes several VIP-related peptides and displays a very low affinity for secretin despite >70% homology between VIP and secretin. Conversely, the human secretin receptor has high affinity for secretin but low affinity for VIP. We took advantage of this reversed selectivity to identify a domain of the VPAC1 receptor responsible for selectivity toward secretin by constructing human VPAC1-secretin receptor chimeras. A first set of chimeras consisted of exchanging the entire N-terminal ectodomain or large parts of this domain. They were constructed by overlap PCR, transfected in COS-7 cells, and their ligand selectivity, expressed as the ratio of EC(50) for secretin/EC(50) for VIP (referred to as S/V), in stimulating cAMP production was measured. Two very informative chimeras respectively referred to as S144V and S123V were obtained by replacing the entire ectodomain or only the first 123 amino acids of the VPAC1 receptor by the corresponding sequences of the secretin receptor. Whereas S144V no longer discriminated between VIP and secretin (S/V = 1.2), S123V discriminated between the two peptides (S/V = 300) in the same manner as the wild-type VPAC1 receptor. The motif responsible for discrimination was determined by introducing small blocks or individual amino acids of secretin receptor in the 123-144 sequence of the S123V chimera. The data obtained from 14 new chimeras sustained that two nonadjacent pairs of amino acids, Gln(135) Thr(136) and Gly(140) Ser(141) in the C-terminal end of the N-terminal VPAC1 receptor ectodomain constitute a selective filter that strongly restricts access of secretin to the VPAC1 receptor.

  11. h

    dsir-pile-100k-filtered-for-pubmed-central

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timaeus, dsir-pile-100k-filtered-for-pubmed-central [Dataset]. https://huggingface.co/datasets/timaeus/dsir-pile-100k-filtered-for-pubmed-central
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Timaeus
    Description

    timaeus/dsir-pile-100k-filtered-for-pubmed-central dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. t

    Data for: comprehensive search filters for retrieving publications on...

    • service.tib.eu
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Data for: comprehensive search filters for retrieving publications on non-human primates for literature reviews (filternhp) - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-utt4sn
    Explore at:
    Dataset updated
    Nov 14, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset supports filterNHP, an R package and web-based application for generating search filters to query scientific bibliographic sources (PubMed, PsycINFO, Web of Science) for non-human primate related publications. filterNHP can be found at: https://filterNHP.dpz.eu.

  13. Z

    Data from: Citation network data sets for 'Oxytocin – a social peptide?...

    • nde-dev.biothings.io
    Updated Jun 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leng, Rhodri Ivor (2022). Citation network data sets for 'Oxytocin – a social peptide? Deconstructing the evidence' [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_5578956
    Explore at:
    Dataset updated
    Jun 5, 2022
    Dataset authored and provided by
    Leng, Rhodri Ivor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?’[1] that is currently under review.

    Data Collection

    The data sets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburgh’s library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:

    TI = (β€œoxytocin” OR β€œpitocin” OR β€œsyntocinon”) AND TS = (β€œsocial*” OR β€œpro$social” OR β€œanti$social”)

    The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as β€˜articles’. Given our interest in primary studies only – articles reporting original data – we excluded all other document types. We further excluded all articles sub-classified as β€˜book chapters’ or as β€˜proceeding papers’ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.

    All available metadata on these 1,977 articles was exported as plain text β€˜flat’ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as β€˜articles’ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as β€˜review’, β€˜systematic review’, or β€˜meta-analysis’, identifying 75 records 3. After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.

    From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a β€˜node-attribute-list’ by first linking unique reference strings (β€˜Cite Me As’ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an β€˜edge-list’ that records the citations from a citing paper in the β€˜Source’ column and identifies the cited paper in the β€˜Target’ column, using the unique identifies as described previously to link these data to the node-attribute-list.

    We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.

    After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis – and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were β€˜isolates’).

    This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the β€˜node-attribute-list’ and β€˜edge-list’ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.

    Before moving on, we calculated indegree for all nodes in this network – this counts the number of citations to a given paper from other papers within this network – and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.

    Data description

    We include here two datasets: (i) β€˜OTSOC-node-attribute-list.csv’ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) β€˜OTSOC-edge-list.csv’ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Below, we detail their contents:

    1. β€˜OTSOC-node-attribute-list.csv’ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:

    Id, the unique identifier

    Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the β€˜Cite Me As’ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).

    Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the β€˜UT= ’ field tag.

    Title, paper title.

    Authors, all named authors.

    Journal, journal of publication.

    Pub_year, year of publication.

    Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021

    Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.

    Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)

    1. β€˜OTSOC-edge -list.csv’ is a comma-separate values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:

    Source, the unique identifier of the citing paper.

    Target, the unique identifier of the cited paper.

    Type, edges are β€˜Directed’, and this column tells Gephi to regard all edges as such.

    Syr_date, this contains the date of publication of the citing paper.

    Tyr_date, this contains the date of publication of the cited paper.

    Software recommended for analysis

    Gephi version 0.9.2 was used for the visualisations within the manuscript, and both files can be read and into Gephi without modification.

    Notes

    [1] Leng, G., Leng, R. I., Ludwig, M. (Submitted). Oxytocin – a social peptide? Deconstructing the evidence.

    [2] Edinburgh University’s subscription to Web of Science covers the following databases: (i) Science Citation Index Expanded, 1900-present; (ii) Social Sciences Citation Index, 1900-present; (iii) Arts & Humanities Citation Index, 1975-present; (iv) Conference Proceedings Citation Index- Science, 1990-present; (v) Conference Proceedings Citation Index- Social Science & Humanities, 1990-present; (vi) Book Citation Index– Science, 2005-present; (vii) Book Citation Index– Social Sciences & Humanities, 2005-present; (viii) Emerging Sources Citation Index, 2015-present.

    [3] For those interested, the following PMIDs were identified as β€˜articles’ by WoS, but as β€˜reviews’ by PubMed: β€˜34502097’ β€˜33400920’ β€˜32060678’ β€˜31925983’ β€˜31734142’ β€˜30496762’ β€˜30253045’ β€˜29660735’ β€˜29518698’ β€˜29065361’ β€˜29048602’ β€˜28867943’ β€˜28586471’ β€˜28301323’ β€˜27974283’ β€˜27626613’ β€˜27603523’ β€˜27603327’ β€˜27513442’ β€˜27273834’ β€˜27071789’ β€˜26940141’ β€˜26932552’ β€˜26895254’ β€˜26869847’ β€˜26788924’ β€˜26581735’ β€˜26548910’ β€˜26317636’ β€˜26121678’ β€˜26094200’ β€˜25997760’ β€˜25631363’ β€˜25526824’ β€˜25446893’ β€˜25153535’ β€˜25092245’ β€˜25086828’ β€˜24946432’ β€˜24637261’ β€˜24588761’ β€˜24508579’ β€˜24486356’ β€˜24462936’ β€˜24239932’ β€˜24239931’ β€˜24231551’ β€˜24216134’ β€˜23955310’ β€˜23856187’ β€˜23686025’ β€˜23589638’ β€˜23575742’ β€˜23469841’ β€˜23055480’ β€˜22981649’ β€˜22406388’ β€˜22373652’ β€˜22141469’ β€˜21960250’ β€˜21881219’ β€˜21802859’ β€˜21714746’ β€˜21618004’ β€˜21150165’ β€˜20435805’ β€˜20173685’ β€˜19840865’ β€˜19546570’ β€˜19309413’ β€˜15288368’ β€˜12359512’ β€˜9401603’ β€˜9213136’ β€˜7630585’

    [4] Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Stable URL: https://sci2.cns.iu.edu

    [5] Bastian, M., Heymann, S., & Jacomy, M. (2009).

  14. r

    Data for "RegulaTome: a corpus of typed, directed, and signed relations...

    • resodate.org
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katerina Nastou (2024). Data for "RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature" [Dataset]. https://resodate.org/resources/aHR0cHM6Ly96ZW5vZG8ub3JnL3JlY29yZHMvMTA4MDgzMzA=
    Explore at:
    Dataset updated
    Apr 23, 2024
    Dataset provided by
    Zenodo
    Authors
    Katerina Nastou
    Description

    RegulaTome corpus: this file contains the RegulaTome corpus inBRAT format. The directory"splits" has the corpus split based on the train/dev/test used for the training of the relation extraction system RegulaTome annodoc: The annotation guidelines along with the annotation configuration files for BRAT are provided in annodoc+config.tar.gz. The online version of the annotation documentation can be found here: https://katnastou.github.io/regulatome-annodoc/ The tagger software can be found here:https://github.com/larsjuhljensen/tagger. The command used to run tagger before large-scale execution of the RE system is: gzip -cd ls -1 pmc/*.en.merged.filtered.tsv.gz ls -1r pubmed/*.tsv.gz | cat dictionary/excluded_documents.txt - | tagger/tagcorpus --threads=16 --autodetect --types=dictionary/curated_types.tsv --entities=dictionary/all_entities.tsv --names=dictionary/all_names_textmining.tsv --groups=dictionary/all_groups.tsv --stopwords=dictionary/all_global.tsv --local-stopwords=dictionary/all_local.tsv --type-pairs=dictionary/all_type_pairs.tsv --out-matches=all_matches.tsv Input documents for large-scale execution, which is done on entire PubMed (as of March 2024) and PMC Open Access (as of November 2023) articles in BioC format. The files are converted to a tab-delimited formatto be compatible with the RE system input (see below). Input dictionary files: all the files necessary to execute the command above are available intagger_dictionary_files.tar.gz Tagger output: we filter the results of the tagger run down to gene/protein hits, and documents with more than 1 hit (since we are doing relation extraction) before feeding it to our RE system. The filtered output is available in tagger_matches_ggp_only_gt_1_hit.tsv.gz Relation extraction system input:combined_input_for_re.tar.gz: these are the directories with all the .ann and .txt files used as input for the large-scale execution of the relation extraction pipeline. The files are generated from the tagger tsv output (see above, tagger_matches_ggp_only_gt_1_hit.tsv.gz) using thetagger2standoff.py script from the string-db-tools repository. Relation extraction models. The Transformer-based model used for large-scale relation extraction and prediction on the test set is atrelation_extraction_multi-label-best_model.tar.gz The pre-trained RoBERTa model on PubMed and PMC and MIMIC-III with a BPE Vocab learned from PubMed (RoBERTa-large-PM-M3-Voc), which is used by our system is available here. Relation extraction system output: the tab-delimited outputs of the relation extraction system are found atlarge_scale_relation_extraction_results.tar.gz !!!ATTENTION this file is approximately 1TB in size, so make sure you have enough space to download it on your machine!!! The relation extraction system output files have 86 columns: PMID, Entity BRAT ID1, Entity BRAT ID2, and scores per class produced by the relation extraction model. Each file has a header to denote which score is in which column.

  15. m

    2014-2019 Systematic Reviews and Meta-Analyses Data: Evidence-Based...

    • data.mendeley.com
    Updated Jul 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toluwase Asubiaro (2021). 2014-2019 Systematic Reviews and Meta-Analyses Data: Evidence-Based Biomedical Publications in MEDLINE with authors from Sub-Saharan Africa [Dataset]. http://doi.org/10.17632/xkry6rjtjg.3
    Explore at:
    Dataset updated
    Jul 28, 2021
    Authors
    Toluwase Asubiaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Africa, Sub-Saharan Africa
    Description

    Bibliographic data of biomedical systematic reviews and meta-analysis studies published between 2014 and 2019, where at least one author is affiliated with an institution in Sub-Saharan Africa was retrieved from MEDLINE via the PubMed search engine. All forty-six (46) countries in Sub-Saharan Africa were included in the search query as affiliation. The search strategy are decripted in four steps:

    Step #1: Nigeria[Affiliation] OR South Africa[Affiliation] OR Ghana[Affiliation] OR Tanzania[Affiliation] OR Kenya[Affiliation] OR Rwanda[Affiliation] OR Botswana[Affiliation] OR Cameroun[Affiliation] OR Senegal[Affiliation] OR Angola[Affiliation] OR Uganda[Affiliation] OR Mali[Affiliation] OR Sierra Leone[Affiliation] OR Ivory Coast[Affiliation] OR Ethiopia[Affiliation] OR Lesotho[Affiliation] OR Zambia[Affiliation] OR Zimbabwe[Affiliation] OR Namibia[Affiliation] OR Guinea[Affiliation] OR Mauritius[Affiliation] OR Mozambique[Affiliation] OR Niger[Affiliation] OR Seychelles[Affiliation] OR Burkina Faso[Affiliation] OR Burundi[Affiliation] OR Cape Verde[Affiliation] OR Cameroon[Affiliation] OR Central African Republic[Affiliation] OR Chad[Affiliation] OR Comoros[Affiliation] OR Democratic Republic of Congo[Affiliation] OR DR Congo[Affiliation] OR Djibouti[Affiliation] OR Cote D'ivoire[Affiliation] OR Congo[Affiliation] OR Equatorial Guinea[Affiliation] OR Eritrea[Affiliation] OR Gabon[Affiliation] OR Guinea-Bissau[Affiliation] OR Madagascar[Affiliation] OR Congo Republic[Affiliation] OR Sao Tome and Principe[Affiliation] OR Swaziland[Affiliation] OR Togo[Affiliation] OR Benin[Affiliation] OR Liberia[Affiliation] OR Namibia[Affiliation] OR Gambia[Affiliation] OR (Cent Afr Republ[Affiliation]) OR (Equat Guinea[Affiliation]) OR (Papua N Guinea[Affiliation]) OR (Sao Tome E Prin[Affiliation]) OR Principe[Affiliation] OR Sao Tome E Principe[Affiliation]

    Step #2 The filter was set to Meta-Analysis[ptyp] OR systematic[sb]

    Step #3: Text word search systematic review[Text Word] OR meta-analysis[Text Word] OR meta analysis[Text Word]

    Step #4: Set publication date to: "2014/01/01"[PDAT] : "2019/12/31"[PDAT]

    The search which was done on April 2nd, 2020 returned 3,171 results. The bibliographic data collected with the queries posed to PubMed were cleaned, duplicates were removed and articles that were not meta-analysis or systematic reviews were removed. MEDLINE is an authoritative and specialized biomedical database for indexing biomedical publications. Query: (Step #1) AND (Step #2 OR Step #3) AND (Step #4)

  16. Data from: NeuroScape

    • zenodo.org
    zip
    Updated Mar 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Senden; Mario Senden (2025). NeuroScape [Dataset]. http://doi.org/10.5281/zenodo.14865161
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mario Senden; Mario Senden
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NeuroScape: A Curated Dataset of Neuroscientific Articles from 1999 to 2023

    Description

    This dataset comprises a collection of neuroscientific articles published between January 1, 1999, and December 31, 2023. The compilation includes information on articles and research domain clusters in multiple formats, including CSV, GraphML, and HDF5.

    Scope and Selection Criteria

    • Source Journals: The articles in this dataset were selectively retrieved from journals ranked in the first and second quartile (Q1 and Q2) in the field of neuroscience according to the SCImago Journal Rank. Additionally, articles from Q1 multidisciplinary journals such as Nature, Science, and PLOS One were included.
    • Search Methodology: PubMed searches were conducted for each year using the journal name and publication year as query terms. All articles returned from these searches were initially included.
    • Discipline Classification: A neural network classifier was employed to filter articles specifically related to neuroscience. Articles that did not meet the classifier's threshold were excluded.
    • Non-Exhaustiveness: This dataset does not encompass all neuroscientific articles published in the given period. Articles without abstracts or key metadata were omitted, and classification errors may have led to the exclusion of some relevant publications.

    Changelog

    Version 1.0.1 (Latest)

    • Fixed incorrect cluster citation graph: The previous version had an incorrect cluster_citation_density.graphml file. This has now been corrected.

    Directory Structure

    .
    β”œβ”€β”€ Code
    β”‚  β”œβ”€β”€ notebooks
    β”‚ β”‚ β”œβ”€β”€ keyword_search.ipynb β”‚ β”‚ β”œβ”€β”€ exploring_clusters.ipynb β”‚ β”‚ β”œβ”€β”€ loading_article_shards.ipynb β”‚ β”‚ β”œβ”€β”€ traversing_article_graph.ipynb
    β”‚ β”‚ β”œβ”€β”€ discipline_classification.ipynb
    β”‚ β”‚ └── from_generic_to_domain_embedding.ipynb β”‚ β”œβ”€β”€ requirements.txt β”‚ └── src β”‚ β”œβ”€β”€ data_types.py β”‚ └── utils.py └── Data β”œβ”€β”€ CSV β”‚ β”œβ”€β”€ neuroscience_articles_1999-2023.csv β”‚ β”œβ”€β”€ neuroscience_clusters_1999-2023.csv β”‚ └── neuroscience_dimensions_1999-2023.csv β”œβ”€β”€ Graphs β”‚ β”œβ”€β”€ cluster_citation_density.graphml β”‚ β”œβ”€β”€ article_similarity.graphml β”œβ”€β”€ HDF5 β”‚ β”œβ”€β”€ DomainEmbeddings β”‚ β”‚ └── 2037 shard_#SHARD_ID.h5 files containing 200 articles β”‚ └── VoyageAIEmbeddings β”‚ β”œβ”€β”€ Large_02_Instruct
    β”‚ β”‚ └── 2037 shard_#SHARD_ID.h5 files containing 200 articles
    β”‚ └── Lite_02_Instruct
    β”‚ └── 2037 shard_#SHARD_ID.h5 files containing 200 articles └── Models β”œβ”€β”€ discipline_classification_model.pth └── domain_embedding_model.pth

    Code

    The Code folder contains minimal example code to help users get started with the dataset. It includes:

    • Jupyter Notebooks demonstrating how to work with thet data with minimal usage examples.
    • Python Scripts with basic utilities for handling the dataset.

    These examples provide a simple foundation for working with the dataset. More advanced analysis and demonstrations are covered in the accompanying publication.

    CSV Files

    Neuroscience Articles (neuroscience_articles_1999-2023.csv)

    This file contains metadata on neuroscientific articles from 1999 to 2023.

    Variables:

    • Pmid: PubMed ID (unique identifier).
    • Doi: Digital Object Identifier.
    • Type: Article type (Review or Research).
    • Title: Article title.
    • Year: Year of publication.
    • Month: Month of publication.
    • Age: Age of the article as of January 3, 2025.
    • Citations: Total number of citations.
    • Citation Rate: Citations divided by article age.
    • Cluster ID: The research cluster the article belongs to (neuroscience_clusters_1999-2023.csv).
    • Journal: The journal where the article was published.
    • Disciplines: Disciplines published by the journal as classified by SCImago.The article does NOT necessarily qualify for all listed disciplines.
    • Abstract: The abstract of the article.

    Neuroscience Clusters (neuroscience_clusters_1999-2023.csv)

    Clusters of related articles based on research themes.

    Variables:

    • Cluster ID: Unique identifier for the cluster.
    • Title: Title of the research cluster.
    • Size: Number of articles in the cluster.
    • Year First Article: Year of the earliest article in the cluster.
    • MCR Research: Median citation rate for research articles.
    • MCR Review: Median citation rate for review articles.
    • Reference Krackhardt: Measure of internal vs. external references.
    • Citation Krackhardt: Measure of internal vs. external citations.
    • Most Cited Cluster: Cluster most frequently cited by articles in this cluster.
    • Most Citing Cluster: Cluster that cites this cluster the most.
    • Keywords: Keywords describing the cluster.
    • Description: A summary of the research in the cluster.
    • Focus: Whether the cluster is focused on content or methodology.
    • Most Similar Cluster: Cluster most semantically similar to this one.
    • Similarity: Cosine similarity score with the most similar cluster.
    • Distinguishing Features: Key features distinguishing the cluster from its similar cluster.
    • Open Questions: Outstanding research questions within the cluster.
    • Dimensions: Evaluation of dimensions including appliedness, modality, spatiotemporal scale, cognitive complexity, species focus, theoretical engagement, theorey scope, methodological approach, and interdisciplinarity.
    • Trends: Emerging or declining trends between Jan 2021 and December 2023.

    Neuroscience Dimensions (neuroscience_dimensions_1999-2023.csv)

    Provides various research dimensions assessed for each cluster. Each dimension comes with specific binarized categories.

    Key Variables:

    • Appliedness: Fundamental, translational, or clinical focus.
    • Modality: Auditory, visual, olfactory, gustatory, somatosensory.
    • Spatiotemporal Scale: Focus on molecular, cellular, system-level neuroscience.
    • Cognitive Complexity: Simple vs. complex cognitive processes.
    • Species: Human, non-human primate, rodent, etc.
    • Theory Engagement: Data-driven vs hypothesis-driven research.
    • Theory Scope: Scope of theoretical frameworks utilized by the cluster.
    • Methodological Approach: Experimental, observational, computational, meta-analytic.
    • Interdisciplinarity: Low to very high.

    HDF5 Files

    The HDF5 directory contains two sets of embeddings for the abstracts of articles. All folders contain 2037 HDF5 shard files, each holding about 200 articles (using a custom defined article filetype).

    Article Datatypes:

    • pmid, doi, title, type, journal, year, age, citationcount, citationrate, abstract: Corresponds directly with the CSV data.
    • embedding: Text embedding of the article's abstract. There are two versions.
    • out_links: List of PubMed IDs for articles in the dataset that are cited by this article (references).
    • in_links: List of PubMed IDs for articles in the dataset that cite this article (citations).

    Please note that abstracts of articles in the subfolders of HDF5/VoyageAIEmbeddings have been embedded using Voyage AI's voyage-lite-02-instruct and voyage-large-02-instruct models, respectively. Those in the folder HDF5/DomainEmbeddings are voyage-large-02-instructembeddings that have subsequently been further transformed into a domain-specific lower dimensional embedding using a custom neural network (domain_embedding_model.pth).

    Graph-Based Data

    Article Similarity Graph (article_similarity.graphml)

    A graph representation of article similarity based on cosine similarity between abstract embeddings (using domain-specific embedding reuslting from domain_embedding_model.pth).

    • Vertices: Each article is a node with pmid (PubMed ID) as an attribute.
    • Edges: The top 50 nearest neighbor articles (by cosine similarity) form

  17. Database (PubMed): retracted publications of systematic reviews and...

    • figshare.com
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge H Ramirez (2023). Database (PubMed): retracted publications of systematic reviews and meta-analysis (1983 - 2013) [Dataset]. http://doi.org/10.6084/m9.figshare.1216653.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jorge H Ramirez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PubMed (search date: 24/10/2014) | Search query: "retracted publication"[Publication Type] - Filter: systematic reviews | 48 results Google spreadsheet in the URL below

  18. t

    BIOGRID CURATED DATA FOR PUBLICATION: Interaction Between SARS-CoV-2 Spike...

    • thebiogrid.org
    zip
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR PUBLICATION: Interaction Between SARS-CoV-2 Spike Protein S1 Subunit and Oyster Heat Shock Protein 70. [Dataset]. https://thebiogrid.org/253343/publication/interaction-between-sars-cov-2-spike-protein-s1-subunit-and-oyster-heat-shock-protein-70.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 1, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Li J (2024):Interaction Between SARS-CoV-2 Spike Protein S1 Subunit and Oyster Heat Shock Protein 70. curated by BioGRID (https://thebiogrid.org); ABSTRACT: There is growing evidence that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) contaminates the marine environment and is bioaccumulated in filter-feeding shellfish. Previous study shows the Pacific oyster tissues can bioaccumulate the SARS-CoV-2, and the oyster heat shock protein 70 (oHSP70) may play as the primary attachment receptor to bind SARS-CoV-2's recombinant spike protein S1 subunit (rS1). However, detailed information about the interaction between rS1 and oHSP70 is still unknown. In this study, we confirmed that the affinity of recombinant oHSP70 (roHSP70) for rS1 (KD?=?20.4 nM) is comparable to the receptor-binding affinity of rACE2 for rS1 (KD?=?16.7 nM) by surface plasmon resonance (SPR)-based Biacore and further validated by enzyme-linked immunosorbent assay (ELISA). Three truncated proteins (roHSP70-N/C/M) and five mutated proteins (p.I229del, p.D457del, p.V491_K495del, p.K556I, and p.?roHSP70) were constructed according to the molecular docking results. All three truncated proteins have significantly lower affinity for rS1 than the full-length roHSP70, indicating that all three segments of roHSP70 are involved in binding to rS1. Further, the results of SPR and ELISA showed that all five mutant proteins had significantly lower affinity for rS1 than roHSP70, suggesting that amino acids at these sites are involved in binding to rS1. This study provides a preliminary theoretical basis for the bioaccumulation of SARS-CoV-2 in oyster tissues or using roHSP70 as the capture unit to selectively enrich virus particles for detection.

  19. t

    BIOGRID CURATED DATA FOR PUBLICATION: Using lidocaine and benzocaine to link...

    • thebiogrid.org
    zip
    Updated Aug 10, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2009). BIOGRID CURATED DATA FOR PUBLICATION: Using lidocaine and benzocaine to link sodium channel molecular conformations to state-dependent antiarrhythmic drug affinity. [Dataset]. https://thebiogrid.org/180062/publication/using-lidocaine-and-benzocaine-to-link-sodium-channel-molecular-conformations-to-state-dependent-antiarrhythmic-drug-affinity.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 10, 2009
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Hanck DA (2009):Using lidocaine and benzocaine to link sodium channel molecular conformations to state-dependent antiarrhythmic drug affinity. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Lidocaine and other antiarrhythmic drugs bind in the inner pore of voltage-gated Na channels and affect gating use-dependently. A phenylalanine in domain IV, S6 (Phe1759 in Na(V)1.5), modeled to face the inner pore just below the selectivity filter, is critical in use-dependent drug block.Measurement of gating currents and concentration-dependent availability curves to determine the role of Phe1759 in coupling of drug binding to the gating changes.The measurements showed that replacement of Phe1759 with a nonaromatic residue permits clear separation of action of lidocaine and benzocaine into 2 components that can be related to channel conformations. One component represents the drug acting as a voltage-independent, low-affinity blocker of closed channels (designated as lipophilic block), and the second represents high-affinity, voltage-dependent block of open/inactivated channels linked to stabilization of the S4s in domains III and IV (designated as voltage-sensor inhibition) by Phe1759. A homology model for how lidocaine and benzocaine bind in the closed and open/inactivated channel conformation is proposed.These 2 components, lipophilic block and voltage-sensor inhibition, can explain the differences in estimates between tonic and open-state/inactivated-state affinities, and they identify how differences in affinity for the 2 binding conformations can control use-dependence, the hallmark of successful antiarrhythmic drugs.

  20. Additional file 1 of Disclosing ambiguous gene aliases by automatic...

    • springernature.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roney S Coimbra; Dana E Vanderwall; Guilherme C Oliveira (2023). Additional file 1 of Disclosing ambiguous gene aliases by automatic literature profiling [Dataset]. http://doi.org/10.6084/m9.figshare.14438102.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Roney S Coimbra; Dana E Vanderwall; Guilherme C Oliveira
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1:EntrezGene official symbols with PubMed abstracts and their aliases classified by the algorithm. Description of data: 73 randomly chosen official gene symbols that produced text corpora of PubMed abstracts and their aliases. Aliases were classified by the algorithm as β€œsynonyms”, β€œambiguous”, β€œaliases with PubMed abstract but not passing the filters”, or β€œaliases without PubMed abstracts”. (XLS 42 KB)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
shuai wang, pubmed-pmc-sr-filtered [Dataset]. https://huggingface.co/datasets/wshuai190/pubmed-pmc-sr-filtered

pubmed-pmc-sr-filtered

wshuai190/pubmed-pmc-sr-filtered

Explore at:
Authors
shuai wang
Description

wshuai190/pubmed-pmc-sr-filtered

  Dataset Description

This dataset contains medical literature data for training Boolean query generation models. The data includes PubMed articles with their associated metadata, references, and result section PMIDs.

  Dataset Structure





  Data Fields

pmid: PubMed ID of the article pmc-id: PMC ID (if available) title: Article title max-date: Maximum publication date references-pmids: List of PMIDs referenced in the article… See the full description on the dataset page: https://huggingface.co/datasets/wshuai190/pubmed-pmc-sr-filtered.

Search
Clear search
Close search
Google apps
Main menu