34 datasets found
  1. f

    Additional file 1 of A PubMed search filter for efficiently retrieving...

    • springernature.figshare.com
    xlsx
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dawei Yin; Mikaela V. Engracia; Matthew K. Edema; David C. Clarke (2024). Additional file 1 of A PubMed search filter for efficiently retrieving exercise training studies [Dataset]. http://doi.org/10.6084/m9.figshare.28058331.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    figshare
    Authors
    Dawei Yin; Mikaela V. Engracia; Matthew K. Edema; David C. Clarke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary Material 1.

  2. h

    dsir-pile-13m-filtered-for-pubmed-central

    • huggingface.co
    Updated Dec 31, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timaeus (2017). dsir-pile-13m-filtered-for-pubmed-central [Dataset]. https://huggingface.co/datasets/timaeus/dsir-pile-13m-filtered-for-pubmed-central
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 31, 2017
    Dataset authored and provided by
    Timaeus
    Description

    timaeus/dsir-pile-13m-filtered-for-pubmed-central dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. PubMed Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jul 15, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2016). PubMed Datasets [Dataset]. https://brightdata.com/products/datasets/pubmed
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jul 15, 2016
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock valuable biomedical knowledge with our comprehensive PubMed Dataset, designed for researchers, analysts, and healthcare professionals to track medical advancements, explore drug discoveries, and analyze scientific literature.

    Dataset Features

    Scientific Articles & Abstracts: Access structured data from PubMed, including article titles, abstracts, authors, publication dates, and journal sources. Medical Research & Clinical Studies: Retrieve data on clinical trials, drug research, disease studies, and healthcare innovations. Keywords & MeSH Terms: Extract key medical subject headings (MeSH) and keywords to categorize and analyze research topics. Publication & Citation Data: Track citation counts, journal impact factors, and author affiliations for academic and industry research.

    Customizable Subsets for Specific Needs Our PubMed Dataset is fully customizable, allowing you to filter data based on publication date, research category, keywords, or specific journals. Whether you need broad coverage for medical research or focused data for pharmaceutical analysis, we tailor the dataset to your needs.

    Popular Use Cases

    Pharmaceutical Research & Drug Development: Analyze clinical trial data, drug efficacy studies, and emerging treatments. Medical & Healthcare Intelligence: Track disease outbreaks, healthcare trends, and advancements in medical technology. AI & Machine Learning Applications: Use structured biomedical data to train AI models for predictive analytics, medical diagnosis, and literature summarization. Academic & Scientific Research: Access a vast collection of peer-reviewed studies for literature reviews, meta-analyses, and academic publishing. Regulatory & Compliance Monitoring: Stay updated on medical regulations, FDA approvals, and healthcare policy changes.

    Whether you're conducting medical research, analyzing healthcare trends, or developing AI-driven solutions, our PubMed Dataset provides the structured data you need. Get started today and customize your dataset to fit your research objectives.

  4. h

    dsir-pile-100k-filtered-for-pubmed-abstracts

    • huggingface.co
    Updated Jan 13, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timaeus (2018). dsir-pile-100k-filtered-for-pubmed-abstracts [Dataset]. https://huggingface.co/datasets/timaeus/dsir-pile-100k-filtered-for-pubmed-abstracts
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2018
    Dataset authored and provided by
    Timaeus
    Description

    timaeus/dsir-pile-100k-filtered-for-pubmed-abstracts dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Data from: PubMed's Core Clinical Journals Filter: Redesigned for...

    • figshare.com
    txt
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Klein-Fedyshin; Andrea M. Ketchum (2023). PubMed's Core Clinical Journals Filter: Redesigned for Contemporary Clinical Impact and Utility [Dataset]. http://doi.org/10.6084/m9.figshare.21979832.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Michele Klein-Fedyshin; Andrea M. Ketchum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Medical journal usage counts across 814 clinical locations in the U.S. and Canada from 2009 - 2015.

  6. Z

    Pubmed Journal Recommendation System dataset

    • data.niaid.nih.gov
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiayun Liu (2025). Pubmed Journal Recommendation System dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8386010
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    RaĂşl GarcĂ­a Castro
    Manuel Castillo Cara
    Jiayun Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for Journal recommendation, includes title, abstract, keywords, and journal.

    We extracted the journals and more information of:

    Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.

    Dataset Components:

    data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.

    data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022—the latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.

    data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsets—training, validation, and test—using a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.

  7. h

    arxiv-acl-pubmed-hss-abstracts-filtered-20K-cleaned-AI-gen

    • huggingface.co
    Updated Apr 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Polygraf (2025). arxiv-acl-pubmed-hss-abstracts-filtered-20K-cleaned-AI-gen [Dataset]. https://huggingface.co/datasets/polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-20K-cleaned-AI-gen
    Explore at:
    Dataset updated
    Apr 29, 2025
    Dataset authored and provided by
    Polygraf
    Description

    polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-20K-cleaned-AI-gen dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. R

    Uftir_curated Dataset

    • universe.roboflow.com
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    uFTIR Particles (2023). Uftir_curated Dataset [Dataset]. https://universe.roboflow.com/uftir-particles/uftir_curated/model/6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2023
    Dataset authored and provided by
    uFTIR Particles
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Particle Polygons
    Description

    Micro-FTIR Filter Images for Particle Detection

    This dataset consists of annotated images of filters containing particles. The primary objective of this dataset is to serve as training and validation data for developing a particle detection model using computer vision techniques. More specifically, this dataset can be used to train an image segmentation model that can be used with GEPARD (https://pubmed.ncbi.nlm.nih.gov/32436395/) in order to perform efficient particle detection and analysis using Micro-FTIR microscope.

    Two kind of samples are used in our case:

    • Normal filters, with a low amount of particles and a clear view of the filter
    • Saturated filters, where the particles cover almost all the filter

    In the first case, particles were annotated easilly as they are clearly visible over the filter. In the second scenario, the most distinguishable particles on the image have been annotated.

    Note

    In the case of a saturated filters, the correct method would be to collect a spectral image of the entire filter using a FPA detector or similar and then use tools (e.g. sIMPle ) to analyse this image. However, in our scenario such detector was not available, and a semi-random / operator dependant method had to be used in order to select particles or points for scanning.

  9. s

    HuGE Navigator - Human Genome Epidemiology Navigator

    • scicrunch.org
    • neuinfo.org
    • +1more
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). HuGE Navigator - Human Genome Epidemiology Navigator [Dataset]. http://identifiers.org/RRID:SCR_003172
    Explore at:
    Dataset updated
    Dec 4, 2023
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 5, 2023. Knowledge base of genetic associations and human genome epidemiology including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests. This tool explores HuGENet, the Human Genome Epidemiology Network, which is a global collaboration of individuals and organizations committed to the assessment of the impact of human genome variation on population health and how genetic information can be used to improve health and prevent disease. What does HuGE Navigator offer? *HuGEpedia - an encyclopedia of human genetic variation in health and disease, includes, Phenopedia and Genopedia. Phenopedia allows you to look up gene-disease association summaries by disease, and Genopedia allows you to look up gene-disease association summaries by gene. In general, HuGEpedia is a searchable database that summarizes published articles about human disease and genetic variation, including primary studies, reviews, and meta-analyses. It provides links to Pubmed abstracts, researcher contact info, trends, and more. *HuGEtools - searching and mining the literature in human genome epidemiology, includes, HuGE Literature Finder, HuGE Investigator Browser, Gene Prospector, HuGE Watch, Variant Name Mapper, and HuGE Risk Translator. *HuGE Literature Finder finds published articles in human genome epidemiology since 2001. The search query can include genes, disease, outcome, environmental factors, author, etc. Results can be filtered by these categories. It is also possible to see all articles in the database for a particular topic, such as genotype prevalence, pharmacogenomics, or clinical trial. *HuGE Investigator Browser finds investigators in a particular field of human genome epidemiology. This info is obtained using a behind-the-scenes tool that automatically parses PubMed affiliation data. *Gene Prospector is a gateway for evaluating genes in relation to disease and risk factors. This tool allows you to enter a disease or risk factor and then supplies you with a table of genes associated w/your query that are ranked based on strength of evidence from the literature. This evidence is culled from the HuGE Literature Finder and NCBI Entrez Gene - And you're given the scoring formula. The Gene Prospector results table provides access to the Genopedia entry for each gene in the list, general info including links to other resources, SNP info, and associated literature from HuGE, PubMed, GWAS, and more. It is a great place to locate a lot of info about your disease/gene of interest very quickly. *HuGE Watch tracks the evolution of published literature, HuGE investigators, genes studied, or diseases studied in human genome epidemiology. For example, if you search Trend/Pattern for Diseases Studied you'll initially get a graph and chart of the number of diseases studied per year since 1997. You can refine these results by limiting the temporal trend to a category or study type such as Gene-gene Interaction or HuGE Review. *Variant Name Mapper maps common names and rs numbers of genetic variants using information from SNP500Cancer, SNPedia, pharmGKB, ALFRED, AlzGene, PDGene, SZgene, HuGE Navigator, LSDBs, and user submissions. *HuGE Risk Translator calculates the predictive value of genetic markers for disease risk. To do so, users must enter the frequency of risk variant, the population disease risk, and the odds ratio between the gene and disease. This information is necessary in order to yield a useful predictive result. *HuGEmix - a series of HuGE related informatics utilities and projects, includes, GAPscreener, HuGE Track, Open Source. GAPscreener is a screening tool for published literature on human genetic associations; HuGE Track is a custom track built for HuGE data in the UCSC Genome Browser; and Open Source is infrastructure for managing knowledge and information from PubMed.

  10. t

    Data for: comprehensive search filters for retrieving publications on...

    • service.tib.eu
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Data for: comprehensive search filters for retrieving publications on non-human primates for literature reviews (filternhp) - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/goe-doi-10-25625-utt4sn
    Explore at:
    Dataset updated
    May 16, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset supports filterNHP, an R package and web-based application for generating search filters to query scientific bibliographic sources (PubMed, PsycINFO, Web of Science) for non-human primate related publications. filterNHP can be found at: https://filterNHP.dpz.eu.

  11. STS Model of the PubMed Literature

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Boyack; Caleb Smith; Richard Klavans (2023). STS Model of the PubMed Literature [Dataset]. http://doi.org/10.6084/m9.figshare.12743639.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Boyack; Caleb Smith; Richard Klavans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PubMed model contains over 18 million PubMed documents (1996-2019) clustered into 28,743 clusters for use in research planning, portfolio analysis, systematic review, etc. This repository contains the PMID-to-cluster listing, an Excel workbook that characterizes each cluster with metadata and cluster-level indicators, and a Tableau workbook containing those same data plus a visual map and filters that can be used to explore the landscape and analyze cluster-level information. Model created by SciTech Strategies, Inc. Details can be found in the accompanying article published in Scientific Data at https://www.nature.com/articles/s41597-020-00749-y (or https://rdcu.be/ca4kv).

  12. f

    Data from: Searching for LINCS to Stress: Using Text Mining to Automate...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryant A. Chambers; Danilo Basili; Laura Word; Nancy Baker; Alistair Middleton; Richard S. Judson; Imran Shah (2024). Searching for LINCS to Stress: Using Text Mining to Automate Reference Chemical Curation [Dataset]. http://doi.org/10.1021/acs.chemrestox.3c00335.s008
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 13, 2024
    Dataset provided by
    ACS Publications
    Authors
    Bryant A. Chambers; Danilo Basili; Laura Word; Nancy Baker; Alistair Middleton; Richard S. Judson; Imran Shah
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Adaptive stress response pathways (SRPs) restore cellular homeostasis following perturbation but may activate terminal outcomes like apoptosis, autophagy, or cellular senescence if disruption exceeds critical thresholds. Because SRPs hold the key to vital cellular tipping points, they are targeted for therapeutic interventions and assessed as biomarkers of toxicity. Hence, we are developing a public database of chemicals that perturb SRPs to enable new data-driven tools to improve public health. Here, we report on the automated text-mining pipeline we used to build and curate the first version of this database. We started with 100 reference SRP chemicals gathered from published biomarker studies to bootstrap the database. Second, we used information retrieval to find co-occurrences of reference chemicals with SRP terms in PubMed abstracts and determined pairwise mutual information thresholds to filter biologically relevant relationships. Third, we applied these thresholds to find 1206 putative SRP perturbagens within thousands of substances in the Library of Integrated Network-Based Cellular Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain experts had to manually review at least three publications for each of 1206 chemicals out of 181,805 total abstracts. To accomplish this efficiently, we implemented a machine learning approach to predict SRP classifications from texts to prioritize abstracts. In 5-fold cross-validation testing with a corpus derived from the 100 reference chemicals, artificial neural networks performed the best (F1-macro = 0.678) and prioritized 2479/181,805 abstracts for expert review, which resulted in 457 chemicals annotated with SRP activities. An independent analysis of enriched mechanisms of action and chemical use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90 and DNA damage inducers to topoisomerase inhibition. This database will enable novel applications of LINCS data to evaluate SRP activities and to further develop tools for biomedical information extraction from the literature.

  13. utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments...

    • zenodo.org
    zip
    Updated Apr 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Borcherding; Nicholas Borcherding (2022). utility: Collection of Tumor-Infiltrating Lymphocyte Single-Cell Experiments with TCR [Dataset]. http://doi.org/10.5281/zenodo.6325603
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 7, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nicholas Borcherding; Nicholas Borcherding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    The original intent of assembling a data set of publicly-available tumor-infiltrating T cells (TILs) with paired TCR sequencing was to expand and improve the scRepertoire R package. However, after some discussion, we decided to release the data set for everyone, a complete summary of the sequencing runs and the sample information can be found in the meta data of the Seurat object. This repository is the 4th version of the data, with addition of cells and changes to the workflow.

    Methods

    Single-Cell Data Processing

    The filtered gene matrices output from Cell Ranger align function from individual sequencing runs (10x Genomics, Pleasanton, CA) loaded into the R global environment. For each sequencing run cell barcodes were appended to contain a unique prefix to prevent issues with duplicate barcodes. The results were then ported into individual Seurat objects (citation), where the cells with > 10% mitochondrial genes and/or 2.5x natural log distribution of counts were excluded for quality control purposes. At the individual sequencing run level, doublets were estimated using the scDblFinder (v1.4.0) R package.

    Annotation of Cells

    Automatic annotation was performed using the singler (v1.4.1) R package (citation) with the HPCA (citation) and Monaco (citation) data sets as references and the fine label discriminators. Individual sequencing runs were subsetted to run through the singleR algorithm in order to reduce memory demands. The output of all the singleR analyses were collated and appended to the meta data of the seurat object. Likewise, the ProjecTILs (v0.4.1) R Package (citation) was used for automatic annotation as a partially orthogonal approach.

    Addition of TCR data

    The filtered contig annotation T cell receptor (TCR) data for available sequencing runs were loaded into the R global environment. Individual contigs were combined using the combineTCR() function of scRepertoire (v1.3.5) R Package (citation). Clonotypes were assigned to barcodes and were multiple duplicate chains for individual cells were filtered to select for the top expressing contig by read count. The clonotype data was then added to the Seurat Object with proportion across individual patients being used to calculate frequency.

    Citations

    As of right now, there is no citation associated with the assembled data set. However if using the data, please find the corresponding manuscript for each data set in the meta.data of the single-cell object. In addition, if using the processed data, feel free to modify the language in the methods section (above) and please cite the appropriate manuscripts of the software or references that were used.

    Itemized List of the Software Used

    Itemized List of Reference Data Used

    Future Directions

    • Data Hosting for Interactive Analysis
    • Easy Submission Portal for Researchers to Add Data
    • Using the Data to Build a Reference Atlas

    There are areas in which we are actively hoping to develop to further facilitate the usage of the data set - if you have other suggestions, please reach out using the contact information below.

    Contact

    Questions, comments, and suggestions, please feel free to contact Nick Borcherding via this repository, email, or using twitter.

  14. Data from: Citation network data sets for 'Oxytocin – a social peptide?...

    • zenodo.org
    csv
    Updated Jun 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rhodri Ivor Leng; Rhodri Ivor Leng (2022). Citation network data sets for 'Oxytocin – a social peptide? Deconstructing the evidence' [Dataset]. http://doi.org/10.5281/zenodo.6615221
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rhodri Ivor Leng; Rhodri Ivor Leng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?’[1]

    Data Collection

    The datasets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburgh’s library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:

    TI = (“oxytocin” OR “pitocin” OR “syntocinon”) AND TS = (“social*” OR “pro$social” OR “anti$social”)

    The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as ‘articles’. Given our interest in primary studies only – articles reporting original data – we excluded all other document types. We further excluded all articles sub-classified as ‘book chapters’ or as ‘proceeding papers’ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.

    All available metadata on these 1,977 articles was exported as plain text ‘flat’ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as ‘articles’ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as ‘review’, ‘systematic review’, or ‘meta-analysis’, identifying 75 records [3] (thus, ~4% of records classified by WoS were classified as reviews in PubMed). After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.

    From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a ‘node-attribute-list’ by first linking unique reference strings (‘Cite Me As’ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an ‘edge-list’ that records the citations from a citing paper in the ‘Source’ column and identifies the cited paper in the ‘Target’ column, using the unique identifies as described previously to link these data to the node-attribute-list.

    We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.

    After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis – and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were ‘isolates’).

    This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the ‘node-attribute-list’ and ‘edge-list’ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.

    Before moving on, we calculated indegree for all nodes in this network – this counts the number of citations to a given paper from other papers within this network – and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.

    For additional analysis, we also analysed the full reference list data to examine the most commonly cited references between 2016 and 2021 - the results of this are described in OTSOC_Cited_2016-2021.csv. This takes the reference lists of all retrieved papers within the network and examines their full reference lists (including references to other papers not contained within the network). These data were cleaned by matching DOIs and manual cleansing.

    Data description

    We include here two network datasets: (i) ‘OTSOC-node-attribute-list.csv’ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) ‘OTSOC-edge-list.csv’ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Finally, we include (iii) 'OTSOC_Cited_2016-2021' that lists all papers cited by >10 papers in the OTSOC network following any analysis of the bibliographies of retrieved papers. Below, we detail their contents:

    1. ‘OTSOC-node-attribute-list.csv’ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:

    Id, the unique identifier

    Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the ‘Cite Me As’ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).

    Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the ‘UT= ’ field tag.

    Title, paper title.

    Authors, all named authors.

    Journal, journal of publication.

    Pub_year, year of publication.

    Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021

    Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.

    Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)

    2. ‘OTSOC-edge -list.csv’ is a comma-separated values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:

    Source, the unique identifier of the citing paper.

    Target, the unique identifier of the cited paper.

    Type, edges are ‘Directed’, and this column tells Gephi to regard all edges as such.

    Syr_date, this contains the date of publication of the citing paper.

    Tyr_date, this contains the date of publication of the cited paper.

    3. 'OTSOC_Cited_2016-2021.csv' is a comma-separated values file that contain citations to all cited references that were cited by at least 10 of the retrieved papers within the OTSOC network published from 2016 onwards. The columns refer to:

    Reference, the cited reference string extracted from the bibliographies of retrieved papers.

    Publication year, the publication year of the cited reference.

    DOI, the DOI of the cited reference.

    indegree_2016, the total number of citations to a cited reference from papers published in 2016 and contained within the OTSOC network.

    indegree_2017, the total number of citations to a cited reference from papers published in 2017 and contained within the OTSOC network.

    indegree_2018, the total number of citations to a cited reference from papers published in 2018 and contained within the OTSOC network.

    indegree_2019, the total number of citations to a cited

  15. d

    Data from: Systematic review reveals multiple sexually antagonistic...

    • datadryad.org
    zip
    Updated Oct 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jon Alexander Harper (2021). Systematic review reveals multiple sexually antagonistic polymorphisms affecting human disease and complex traits [Dataset]. http://doi.org/10.5061/dryad.rv15dv48k
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 25, 2021
    Dataset provided by
    Dryad
    Authors
    Jon Alexander Harper
    Time period covered
    2021
    Description

    For this systematic review we followed PRISMA guidance where possible (Moher et al., 2009). PubMed (https://pubmed.ncbi.nlm.nih.gov/) was searched for articles on 2nd December 2020 with no time limit. The searches were carried out in two Stages, with the organism filter set to human in both stages. In Stage 1, eligible studies were required to report specific genetic variants or haplotypes that were referred to as sexually antagonistic or were an example of intralocus sexual conflict. To achieve this, we conducted a Boolean search for articles that used the terms “sexual antagonism” OR “sexually antagonistic” OR “intralocus sexual conflict” AND “locus” OR “loci”, “gene” OR “snp” OR “polymorphism” OR “variant” OR “allele” in their abstract or title. The Stage 1 search returned 34 articles in total (full search term in the supplementary material; search output is accessible at https://pubmed.ncbi.nlm.nih.gov/collections/60255050/?sort=pubdate).

    In Stage 2, studies were required to report...

  16. Database (PubMed): retracted publications of systematic reviews and...

    • figshare.com
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge H Ramirez (2023). Database (PubMed): retracted publications of systematic reviews and meta-analysis (1983 - 2013) [Dataset]. http://doi.org/10.6084/m9.figshare.1216653.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    figshare
    Authors
    Jorge H Ramirez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PubMed (search date: 24/10/2014) | Search query: "retracted publication"[Publication Type] - Filter: systematic reviews | 48 results Google spreadsheet in the URL below

  17. m

    Systematic Reviews and Meta-Analysis published between 2014 and 2019 with...

    • data.mendeley.com
    Updated Jul 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toluwase Asubiaro (2021). Systematic Reviews and Meta-Analysis published between 2014 and 2019 with authors Sub-Saharan Africa [Dataset]. http://doi.org/10.17632/xkry6rjtjg.2
    Explore at:
    Dataset updated
    Jul 1, 2021
    Authors
    Toluwase Asubiaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sub-Saharan Africa, Africa
    Description

    Bibliographic data of biomedical systematic reviews and meta-analysis studies published between 2014 and 2019, where at least one author is affiliated with an institution in Sub-Saharan Africa was retrieved from MEDLINE via the PubMed search engine. All forty-six (46) countries in Sub-Saharan Africa were included in the search query as affiliation. The search strategy are decripted in four steps:

    Step #1: Nigeria[Affiliation] OR South Africa[Affiliation] OR Ghana[Affiliation] OR Tanzania[Affiliation] OR Kenya[Affiliation] OR Rwanda[Affiliation] OR Botswana[Affiliation] OR Cameroun[Affiliation] OR Senegal[Affiliation] OR Angola[Affiliation] OR Uganda[Affiliation] OR Mali[Affiliation] OR Sierra Leone[Affiliation] OR Ivory Coast[Affiliation] OR Ethiopia[Affiliation] OR Lesotho[Affiliation] OR Zambia[Affiliation] OR Zimbabwe[Affiliation] OR Namibia[Affiliation] OR Guinea[Affiliation] OR Mauritius[Affiliation] OR Mozambique[Affiliation] OR Niger[Affiliation] OR Seychelles[Affiliation] OR Burkina Faso[Affiliation] OR Burundi[Affiliation] OR Cape Verde[Affiliation] OR Cameroon[Affiliation] OR Central African Republic[Affiliation] OR Chad[Affiliation] OR Comoros[Affiliation] OR Democratic Republic of Congo[Affiliation] OR DR Congo[Affiliation] OR Djibouti[Affiliation] OR Cote D'ivoire[Affiliation] OR Congo[Affiliation] OR Equatorial Guinea[Affiliation] OR Eritrea[Affiliation] OR Gabon[Affiliation] OR Guinea-Bissau[Affiliation] OR Madagascar[Affiliation] OR Congo Republic[Affiliation] OR Sao Tome and Principe[Affiliation] OR Swaziland[Affiliation] OR Togo[Affiliation] OR Benin[Affiliation] OR Liberia[Affiliation] OR Namibia[Affiliation] OR Gambia[Affiliation] OR (Cent Afr Republ[Affiliation]) OR (Equat Guinea[Affiliation]) OR (Papua N Guinea[Affiliation]) OR (Sao Tome E Prin[Affiliation]) OR Principe[Affiliation] OR Sao Tome E Principe[Affiliation]

    Step #2 The filter was set to Meta-Analysis[ptyp] OR systematic[sb]

    Step #3: Text word search systematic review[Text Word] OR meta-analysis[Text Word] OR meta analysis[Text Word]

    Step #4: Set publication date to: "2014/01/01"[PDAT] : "2019/12/31"[PDAT]

    The search which was done on April 2nd, 2020 returned 3,171 results. The bibliographic data collected with the queries posed to PubMed were cleaned, duplicates were removed and articles that were not meta-analysis or systematic reviews were removed. MEDLINE is an authoritative and specialized biomedical database for indexing biomedical publications. Query: (Step #1) AND (Step #2 OR Step #3) AND (Step #4)

  18. h

    arxiv-acl-pubmed-hss-abstracts-filtered-full

    • huggingface.co
    Updated Apr 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Polygraf (2025). arxiv-acl-pubmed-hss-abstracts-filtered-full [Dataset]. https://huggingface.co/datasets/polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-full
    Explore at:
    Dataset updated
    Apr 7, 2025
    Dataset authored and provided by
    Polygraf
    Description

    polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-full dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. t

    BIOGRID CURATED DATA FOR PUBLICATION: The scaffold protein Ste5 directly...

    • thebiogrid.org
    zip
    Updated May 6, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2010). BIOGRID CURATED DATA FOR PUBLICATION: The scaffold protein Ste5 directly controls a switch-like mating decision in yeast. [Dataset]. https://thebiogrid.org/100906/publication/the-scaffold-protein-ste5-directly-controls-a-switch-like-mating-decision-in-yeast.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2010
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for Malleshaiah MK (2010):The scaffold protein Ste5 directly controls a switch-like mating decision in yeast. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Evolution has resulted in numerous innovations that allow organisms to increase their fitness by choosing particular mating partners, including secondary sexual characteristics, behavioural patterns, chemical attractants and corresponding sensory mechanisms. The haploid yeast Saccharomyces cerevisiae selects mating partners by interpreting the concentration gradient of pheromone secreted by potential mates through a network of mitogen-activated protein kinase (MAPK) signalling proteins. The mating decision in yeast is an all-or-none, or switch-like, response that allows cells to filter weak pheromone signals, thus avoiding inappropriate commitment to mating by responding only at or above critical concentrations when a mate is sufficiently close. The molecular mechanisms that govern the switch-like mating decision are poorly understood. Here we show that the switching mechanism arises from competition between the MAPK Fus3 and a phosphatase Ptc1 for control of the phosphorylation state of four sites on the scaffold protein Ste5. This competition results in a switch-like dissociation of Fus3 from Ste5 that is necessary to generate the switch-like mating response. Thus, the decision to mate is made at an early stage in the pheromone pathway and occurs rapidly, perhaps to prevent the loss of the potential mate to competitors. We argue that the architecture of the Fus3-Ste5-Ptc1 circuit generates a novel ultrasensitivity mechanism, which is robust to variations in the concentrations of these proteins. This robustness helps assure that mating can occur despite stochastic or genetic variation between individuals. The role of Ste5 as a direct modulator of a cell-fate decision expands the functional repertoire of scaffold proteins beyond providing specificity and efficiency of information processing. Similar mechanisms may govern cellular decisions in higher organisms and be disrupted in cancer.

  20. f

    pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    • figshare.com
    txt
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joachim Baran; Martin Gerner; Maximilian Haeussler; Goran Nenadic; Casey M. Bergman (2023). pubmed2ensembl: A Resource for Mining the Biological Literature on Genes [Dataset]. http://doi.org/10.1371/journal.pone.0024716
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Joachim Baran; Martin Gerner; Maximilian Haeussler; Goran Nenadic; Casey M. Bergman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. Methodology/Principal FindingsTo overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. Conclusion/SignificanceBy allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dawei Yin; Mikaela V. Engracia; Matthew K. Edema; David C. Clarke (2024). Additional file 1 of A PubMed search filter for efficiently retrieving exercise training studies [Dataset]. http://doi.org/10.6084/m9.figshare.28058331.v1

Additional file 1 of A PubMed search filter for efficiently retrieving exercise training studies

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Dec 19, 2024
Dataset provided by
figshare
Authors
Dawei Yin; Mikaela V. Engracia; Matthew K. Edema; David C. Clarke
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Supplementary Material 1.

Search
Clear search
Close search
Google apps
Main menu