Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 1.
timaeus/dsir-pile-13m-filtered-for-pubmed-central dataset hosted on Hugging Face and contributed by the HF Datasets community
https://brightdata.com/licensehttps://brightdata.com/license
Unlock valuable biomedical knowledge with our comprehensive PubMed Dataset, designed for researchers, analysts, and healthcare professionals to track medical advancements, explore drug discoveries, and analyze scientific literature.
Dataset Features
Scientific Articles & Abstracts: Access structured data from PubMed, including article titles, abstracts, authors, publication dates, and journal sources. Medical Research & Clinical Studies: Retrieve data on clinical trials, drug research, disease studies, and healthcare innovations. Keywords & MeSH Terms: Extract key medical subject headings (MeSH) and keywords to categorize and analyze research topics. Publication & Citation Data: Track citation counts, journal impact factors, and author affiliations for academic and industry research.
Customizable Subsets for Specific Needs Our PubMed Dataset is fully customizable, allowing you to filter data based on publication date, research category, keywords, or specific journals. Whether you need broad coverage for medical research or focused data for pharmaceutical analysis, we tailor the dataset to your needs.
Popular Use Cases
Pharmaceutical Research & Drug Development: Analyze clinical trial data, drug efficacy studies, and emerging treatments. Medical & Healthcare Intelligence: Track disease outbreaks, healthcare trends, and advancements in medical technology. AI & Machine Learning Applications: Use structured biomedical data to train AI models for predictive analytics, medical diagnosis, and literature summarization. Academic & Scientific Research: Access a vast collection of peer-reviewed studies for literature reviews, meta-analyses, and academic publishing. Regulatory & Compliance Monitoring: Stay updated on medical regulations, FDA approvals, and healthcare policy changes.
Whether you're conducting medical research, analyzing healthcare trends, or developing AI-driven solutions, our PubMed Dataset provides the structured data you need. Get started today and customize your dataset to fit your research objectives.
timaeus/dsir-pile-100k-filtered-for-pubmed-abstracts dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Medical journal usage counts across 814 clinical locations in the U.S. and Canada from 2009 - 2015.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for Journal recommendation, includes title, abstract, keywords, and journal.
We extracted the journals and more information of:
Jiasheng Sheng. (2022). PubMed-OA-Extraction-dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6330817.
Dataset Components:
data_pubmed_all: This dataset encompasses all articles, each containing the following columns: 'pubmed_id', 'title', 'keywords', 'journal', 'abstract', 'conclusions', 'methods', 'results', 'copyrights', 'doi', 'publication_date', 'authors', 'AKE_pubmed_id', 'AKE_pubmed_title', 'AKE_abstract', 'AKE_keywords', 'File_Name'.
data_pubmed: To focus on recent and relevant publications, we have filtered this dataset to include articles published within the last five years, from January 1, 2018, to December 13, 2022âthe latest date in the dataset. Additionally, we have exclusively retained journals with more than 200 published articles, resulting in 262,870 articles from 469 different journals.
data_pubmed_train, data_pubmed_val, and data_pubmed_test: For machine learning and model development purposes, we have partitioned the 'data_pubmed' dataset into three subsetsâtraining, validation, and testâusing a random 60/20/20 split ratio. Notably, this division was performed on a per-journal basis, ensuring that each journal's articles are proportionally represented in the training (60%), validation (20%), and test (20%) sets. The resulting partitions consist of 157,540 articles in the training set, 52,571 articles in the validation set, and 52,759 articles in the test set.
polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-20K-cleaned-AI-gen dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Micro-FTIR Filter Images for Particle Detection
This dataset consists of annotated images of filters containing particles. The primary objective of this dataset is to serve as training and validation data for developing a particle detection model using computer vision techniques. More specifically, this dataset can be used to train an image segmentation model that can be used with GEPARD (https://pubmed.ncbi.nlm.nih.gov/32436395/) in order to perform efficient particle detection and analysis using Micro-FTIR microscope.
Two kind of samples are used in our case:
In the first case, particles were annotated easilly as they are clearly visible over the filter. In the second scenario, the most distinguishable particles on the image have been annotated.
Note
In the case of a saturated filters, the correct method would be to collect a spectral image of the entire filter using a FPA detector or similar and then use tools (e.g. sIMPle ) to analyse this image. However, in our scenario such detector was not available, and a semi-random / operator dependant method had to be used in order to select particles or points for scanning.
THIS RESOURCE IS NO LONGER IN SERVICE. Documented on January 5, 2023. Knowledge base of genetic associations and human genome epidemiology including information on population prevalence of genetic variants, gene-disease associations, gene-gene and gene- environment interactions, and evaluation of genetic tests. This tool explores HuGENet, the Human Genome Epidemiology Network, which is a global collaboration of individuals and organizations committed to the assessment of the impact of human genome variation on population health and how genetic information can be used to improve health and prevent disease. What does HuGE Navigator offer? *HuGEpedia - an encyclopedia of human genetic variation in health and disease, includes, Phenopedia and Genopedia. Phenopedia allows you to look up gene-disease association summaries by disease, and Genopedia allows you to look up gene-disease association summaries by gene. In general, HuGEpedia is a searchable database that summarizes published articles about human disease and genetic variation, including primary studies, reviews, and meta-analyses. It provides links to Pubmed abstracts, researcher contact info, trends, and more. *HuGEtools - searching and mining the literature in human genome epidemiology, includes, HuGE Literature Finder, HuGE Investigator Browser, Gene Prospector, HuGE Watch, Variant Name Mapper, and HuGE Risk Translator. *HuGE Literature Finder finds published articles in human genome epidemiology since 2001. The search query can include genes, disease, outcome, environmental factors, author, etc. Results can be filtered by these categories. It is also possible to see all articles in the database for a particular topic, such as genotype prevalence, pharmacogenomics, or clinical trial. *HuGE Investigator Browser finds investigators in a particular field of human genome epidemiology. This info is obtained using a behind-the-scenes tool that automatically parses PubMed affiliation data. *Gene Prospector is a gateway for evaluating genes in relation to disease and risk factors. This tool allows you to enter a disease or risk factor and then supplies you with a table of genes associated w/your query that are ranked based on strength of evidence from the literature. This evidence is culled from the HuGE Literature Finder and NCBI Entrez Gene - And you're given the scoring formula. The Gene Prospector results table provides access to the Genopedia entry for each gene in the list, general info including links to other resources, SNP info, and associated literature from HuGE, PubMed, GWAS, and more. It is a great place to locate a lot of info about your disease/gene of interest very quickly. *HuGE Watch tracks the evolution of published literature, HuGE investigators, genes studied, or diseases studied in human genome epidemiology. For example, if you search Trend/Pattern for Diseases Studied you'll initially get a graph and chart of the number of diseases studied per year since 1997. You can refine these results by limiting the temporal trend to a category or study type such as Gene-gene Interaction or HuGE Review. *Variant Name Mapper maps common names and rs numbers of genetic variants using information from SNP500Cancer, SNPedia, pharmGKB, ALFRED, AlzGene, PDGene, SZgene, HuGE Navigator, LSDBs, and user submissions. *HuGE Risk Translator calculates the predictive value of genetic markers for disease risk. To do so, users must enter the frequency of risk variant, the population disease risk, and the odds ratio between the gene and disease. This information is necessary in order to yield a useful predictive result. *HuGEmix - a series of HuGE related informatics utilities and projects, includes, GAPscreener, HuGE Track, Open Source. GAPscreener is a screening tool for published literature on human genetic associations; HuGE Track is a custom track built for HuGE data in the UCSC Genome Browser; and Open Source is infrastructure for managing knowledge and information from PubMed.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset supports filterNHP, an R package and web-based application for generating search filters to query scientific bibliographic sources (PubMed, PsycINFO, Web of Science) for non-human primate related publications. filterNHP can be found at: https://filterNHP.dpz.eu.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PubMed model contains over 18 million PubMed documents (1996-2019) clustered into 28,743 clusters for use in research planning, portfolio analysis, systematic review, etc. This repository contains the PMID-to-cluster listing, an Excel workbook that characterizes each cluster with metadata and cluster-level indicators, and a Tableau workbook containing those same data plus a visual map and filters that can be used to explore the landscape and analyze cluster-level information. Model created by SciTech Strategies, Inc. Details can be found in the accompanying article published in Scientific Data at https://www.nature.com/articles/s41597-020-00749-y (or https://rdcu.be/ca4kv).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Adaptive stress response pathways (SRPs) restore cellular homeostasis following perturbation but may activate terminal outcomes like apoptosis, autophagy, or cellular senescence if disruption exceeds critical thresholds. Because SRPs hold the key to vital cellular tipping points, they are targeted for therapeutic interventions and assessed as biomarkers of toxicity. Hence, we are developing a public database of chemicals that perturb SRPs to enable new data-driven tools to improve public health. Here, we report on the automated text-mining pipeline we used to build and curate the first version of this database. We started with 100 reference SRP chemicals gathered from published biomarker studies to bootstrap the database. Second, we used information retrieval to find co-occurrences of reference chemicals with SRP terms in PubMed abstracts and determined pairwise mutual information thresholds to filter biologically relevant relationships. Third, we applied these thresholds to find 1206 putative SRP perturbagens within thousands of substances in the Library of Integrated Network-Based Cellular Signatures (LINCS). To assign SRP activity to LINCS chemicals, domain experts had to manually review at least three publications for each of 1206 chemicals out of 181,805 total abstracts. To accomplish this efficiently, we implemented a machine learning approach to predict SRP classifications from texts to prioritize abstracts. In 5-fold cross-validation testing with a corpus derived from the 100 reference chemicals, artificial neural networks performed the best (F1-macro = 0.678) and prioritized 2479/181,805 abstracts for expert review, which resulted in 457 chemicals annotated with SRP activities. An independent analysis of enriched mechanisms of action and chemical use class supported the text-mined chemical associations (p < 0.05): heat shock inducers were linked with HSP90 and DNA damage inducers to topoisomerase inhibition. This database will enable novel applications of LINCS data to evaluate SRP activities and to further develop tools for biomedical information extraction from the literature.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The original intent of assembling a data set of publicly-available tumor-infiltrating T cells (TILs) with paired TCR sequencing was to expand and improve the scRepertoire R package. However, after some discussion, we decided to release the data set for everyone, a complete summary of the sequencing runs and the sample information can be found in the meta data of the Seurat object. This repository is the 4th version of the data, with addition of cells and changes to the workflow.
Methods
Single-Cell Data Processing
The filtered gene matrices output from Cell Ranger align function from individual sequencing runs (10x Genomics, Pleasanton, CA) loaded into the R global environment. For each sequencing run cell barcodes were appended to contain a unique prefix to prevent issues with duplicate barcodes. The results were then ported into individual Seurat objects (citation), where the cells with > 10% mitochondrial genes and/or 2.5x natural log distribution of counts were excluded for quality control purposes. At the individual sequencing run level, doublets were estimated using the scDblFinder (v1.4.0) R package.
Annotation of Cells
Automatic annotation was performed using the singler (v1.4.1) R package (citation) with the HPCA (citation) and Monaco (citation) data sets as references and the fine label discriminators. Individual sequencing runs were subsetted to run through the singleR algorithm in order to reduce memory demands. The output of all the singleR analyses were collated and appended to the meta data of the seurat object. Likewise, the ProjecTILs (v0.4.1) R Package (citation) was used for automatic annotation as a partially orthogonal approach.
Addition of TCR data
The filtered contig annotation T cell receptor (TCR) data for available sequencing runs were loaded into the R global environment. Individual contigs were combined using the combineTCR() function of scRepertoire (v1.3.5) R Package (citation). Clonotypes were assigned to barcodes and were multiple duplicate chains for individual cells were filtered to select for the top expressing contig by read count. The clonotype data was then added to the Seurat Object with proportion across individual patients being used to calculate frequency.
Citations
As of right now, there is no citation associated with the assembled data set. However if using the data, please find the corresponding manuscript for each data set in the meta.data of the single-cell object. In addition, if using the processed data, feel free to modify the language in the methods section (above) and please cite the appropriate manuscripts of the software or references that were used.
Itemized List of the Software Used
Itemized List of Reference Data Used
Future Directions
There are areas in which we are actively hoping to develop to further facilitate the usage of the data set - if you have other suggestions, please reach out using the contact information below.
Contact
Questions, comments, and suggestions, please feel free to contact Nick Borcherding via this repository, email, or using twitter.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This note describes the data sets used for all analyses contained in the manuscript 'Oxytocin - a social peptide?â[1]
Data Collection
The datasets described here were originally retrieved from Web of Science (WoS) Core Collection via the University of Edinburghâs library subscription [2]. The aim of the original study for which these data were gathered was to survey peer-reviewed primary studies on oxytocin and social behaviour. To capture relevant papers, we used the following query:
TI = (âoxytocinâ OR âpitocinâ OR âsyntocinonâ) AND TS = (âsocial*â OR âpro$socialâ OR âanti$socialâ)
The final search was performed on the 13 September 2021. This returned a total of 2,747 records, of which 2,049 were classified by WoS as âarticlesâ. Given our interest in primary studies only â articles reporting original data â we excluded all other document types. We further excluded all articles sub-classified as âbook chaptersâ or as âproceeding papersâ in order to limit our analysis to primary studies published in peer-reviewed academic journals. This reduced the set to 1,977 articles. All of these were published in the English language, and no further language refinements were unnecessary.
All available metadata on these 1,977 articles was exported as plain text âflatâ format files in four batches, which we later merged together via Notepad++. Upon manually examination, we discovered examples of papers classified as âarticlesâ by WoS that were, in fact, reviews. To further filter our results, we searched all available PMIDs in PubMed (1,903 had associated PMIDs - ~96% of set). We then filtered results to identify all records classified as âreviewâ, âsystematic reviewâ, or âmeta-analysisâ, identifying 75 records [3] (thus, ~4% of records classified by WoS were classified as reviews in PubMed). After examining a sample and agreeing with the PubMed classification, these were removed these from our dataset - leaving a total of 1,902 articles.
From these data, we constructed two datasets via parsing out relevant reference data via the Sci2 Tool [4]. First, we constructed a ânode-attribute-listâ by first linking unique reference strings (âCite Me Asâ column in WoS data files) to unique identifiers, we then parsed into this dataset information on the identify of a paper, including the title of the article, all authors, journal publication, year of publication, total citations as recorded from WoS, and WoS accession number. Second, we constructed an âedge-listâ that records the citations from a citing paper in the âSourceâ column and identifies the cited paper in the âTargetâ column, using the unique identifies as described previously to link these data to the node-attribute-list.
We then constructed a network in which papers are nodes, and citation links between nodes are directed edges between nodes. We used Gephi Version 0.9.2 [5] to manually clean these data by merging duplicate references that are caused by different reference formats or by referencing errors. To do this, we needed to retain both all retrieved records (1,902) as well as including all of their references to papers whether these were included in our original search or not. In total, this produced a network of 46,633 nodes (unique reference strings) and 112,520 edges (citation links). Thus, the average reference list size of these articles is ~59 references. The mean indegree (within network citations) is 2.4 (median is 1) for the entire network reflecting a great diversity in referencing choices among our 1,902 articles.
After merging duplicates, we then restricted the network to include only articles fully retrieved (1,902), and retrained only those that were connected together by citations links in a large interconnected network (i.e. the largest component). In total, 1,892 (99.5%) of our initial set were connected together via citation links, meaning a total of ten papers were removed from the following analysis â and these were neither connected to the largest component, nor did they form connections with one another (i.e. these were âisolatesâ).
This left us with a network of 1,892 nodes connected together by 26,019 edges. It is this network that is described by the ânode-attribute-listâ and âedge-listâ provided here. This network has a mean in-degree of 13.76 (median in-degree of 4). By restricting our analysis in this way, we lose 44,741 unique references (96%) and 86,501 citations (77%) from the full network, but retain a set of articles tightly knitted together, all of which have been fully retrieved due to possessing certain terms related to oxytocin AND social behaviour in their title, abstract, or associated keywords.
Before moving on, we calculated indegree for all nodes in this network â this counts the number of citations to a given paper from other papers within this network â and have included this in the node-attribute-list. We further clustered this network via modularity maximisation via the Leiden algorithm [6]. We set the algorithm to resolution 1, and allowed the algorithm to run over 100 iterations and 100 restarts. This gave Q=0.43 and identified seven clusters, which we describe in detail within the body of the paper. We have included cluster membership as an attribute in the node-attribute-list.
For additional analysis, we also analysed the full reference list data to examine the most commonly cited references between 2016 and 2021 - the results of this are described in OTSOC_Cited_2016-2021.csv. This takes the reference lists of all retrieved papers within the network and examines their full reference lists (including references to other papers not contained within the network). These data were cleaned by matching DOIs and manual cleansing.
Data description
We include here two network datasets: (i) âOTSOC-node-attribute-list.csvâ consists of the attributes of 1,892 primary articles retrieved from WoS that include terms indicating a focus on oxytocin and social behaviour; (ii) âOTSOC-edge-list.csvâ records the citations between these papers. Together, these can be imported into a range of different software for network analysis; however, we have formatted these for ease of upload into Gephi 0.9.2. Finally, we include (iii) 'OTSOC_Cited_2016-2021' that lists all papers cited by >10 papers in the OTSOC network following any analysis of the bibliographies of retrieved papers. Below, we detail their contents:
1. âOTSOC-node-attribute-list.csvâ is a comma-separate values file that contains all node attributes for the citation network (n=1,892) analysed in the paper. The columns refer to:
Id, the unique identifier
Label, the reference string of the paper to which the attributes in this row correspond. This is taken from the âCite Me Asâ column from the original WoS download. The reference string is in the following format: last name of first author, publication year, journal, volume, start page, and DOI (if available).
Wos_id, unique Web of Science (WoS) accession number. These can be used to query WoS to find further data on all papers via the âUT= â field tag.
Title, paper title.
Authors, all named authors.
Journal, journal of publication.
Pub_year, year of publication.
Wos_citations, total number of citations recorded by WoS Core Collection to a given paper as of 13 September 2021
Indegree, the number of within network citations to a given paper, calculated for the network shown in Figure 1 of the manuscript.
Cluster, provides the cluster membership number as discussed within the manuscript (Figure 1). This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.43|7 clusters)
2. âOTSOC-edge -list.csvâ is a comma-separated values file that contains all citation links between the 1,892 articles (n=26,019). The columns refer to:
Source, the unique identifier of the citing paper.
Target, the unique identifier of the cited paper.
Type, edges are âDirectedâ, and this column tells Gephi to regard all edges as such.
Syr_date, this contains the date of publication of the citing paper.
Tyr_date, this contains the date of publication of the cited paper.
3. 'OTSOC_Cited_2016-2021.csv' is a comma-separated values file that contain citations to all cited references that were cited by at least 10 of the retrieved papers within the OTSOC network published from 2016 onwards. The columns refer to:
Reference, the cited reference string extracted from the bibliographies of retrieved papers.
Publication year, the publication year of the cited reference.
DOI, the DOI of the cited reference.
indegree_2016, the total number of citations to a cited reference from papers published in 2016 and contained within the OTSOC network.
indegree_2017, the total number of citations to a cited reference from papers published in 2017 and contained within the OTSOC network.
indegree_2018, the total number of citations to a cited reference from papers published in 2018 and contained within the OTSOC network.
indegree_2019, the total number of citations to a cited
For this systematic review we followed PRISMA guidance where possible (Moher et al., 2009). PubMed (https://pubmed.ncbi.nlm.nih.gov/) was searched for articles on 2nd December 2020 with no time limit. The searches were carried out in two Stages, with the organism filter set to human in both stages. In Stage 1, eligible studies were required to report specific genetic variants or haplotypes that were referred to as sexually antagonistic or were an example of intralocus sexual conflict. To achieve this, we conducted a Boolean search for articles that used the terms âsexual antagonismâ OR âsexually antagonisticâ OR âintralocus sexual conflictâ AND âlocusâ OR âlociâ, âgeneâ OR âsnpâ OR âpolymorphismâ OR âvariantâ OR âalleleâ in their abstract or title. The Stage 1 search returned 34 articles in total (full search term in the supplementary material; search output is accessible at https://pubmed.ncbi.nlm.nih.gov/collections/60255050/?sort=pubdate).
In Stage 2, studies were required to report...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PubMed (search date: 24/10/2014) | Search query: "retracted publication"[Publication Type] - Filter: systematic reviews | 48 results Google spreadsheet in the URL below
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bibliographic data of biomedical systematic reviews and meta-analysis studies published between 2014 and 2019, where at least one author is affiliated with an institution in Sub-Saharan Africa was retrieved from MEDLINE via the PubMed search engine. All forty-six (46) countries in Sub-Saharan Africa were included in the search query as affiliation. The search strategy are decripted in four steps:
Step #1: Nigeria[Affiliation] OR South Africa[Affiliation] OR Ghana[Affiliation] OR Tanzania[Affiliation] OR Kenya[Affiliation] OR Rwanda[Affiliation] OR Botswana[Affiliation] OR Cameroun[Affiliation] OR Senegal[Affiliation] OR Angola[Affiliation] OR Uganda[Affiliation] OR Mali[Affiliation] OR Sierra Leone[Affiliation] OR Ivory Coast[Affiliation] OR Ethiopia[Affiliation] OR Lesotho[Affiliation] OR Zambia[Affiliation] OR Zimbabwe[Affiliation] OR Namibia[Affiliation] OR Guinea[Affiliation] OR Mauritius[Affiliation] OR Mozambique[Affiliation] OR Niger[Affiliation] OR Seychelles[Affiliation] OR Burkina Faso[Affiliation] OR Burundi[Affiliation] OR Cape Verde[Affiliation] OR Cameroon[Affiliation] OR Central African Republic[Affiliation] OR Chad[Affiliation] OR Comoros[Affiliation] OR Democratic Republic of Congo[Affiliation] OR DR Congo[Affiliation] OR Djibouti[Affiliation] OR Cote D'ivoire[Affiliation] OR Congo[Affiliation] OR Equatorial Guinea[Affiliation] OR Eritrea[Affiliation] OR Gabon[Affiliation] OR Guinea-Bissau[Affiliation] OR Madagascar[Affiliation] OR Congo Republic[Affiliation] OR Sao Tome and Principe[Affiliation] OR Swaziland[Affiliation] OR Togo[Affiliation] OR Benin[Affiliation] OR Liberia[Affiliation] OR Namibia[Affiliation] OR Gambia[Affiliation] OR (Cent Afr Republ[Affiliation]) OR (Equat Guinea[Affiliation]) OR (Papua N Guinea[Affiliation]) OR (Sao Tome E Prin[Affiliation]) OR Principe[Affiliation] OR Sao Tome E Principe[Affiliation]
Step #2 The filter was set to Meta-Analysis[ptyp] OR systematic[sb]
Step #3: Text word search systematic review[Text Word] OR meta-analysis[Text Word] OR meta analysis[Text Word]
Step #4: Set publication date to: "2014/01/01"[PDAT] : "2019/12/31"[PDAT]
The search which was done on April 2nd, 2020 returned 3,171 results. The bibliographic data collected with the queries posed to PubMed were cleaned, duplicates were removed and articles that were not meta-analysis or systematic reviews were removed. MEDLINE is an authoritative and specialized biomedical database for indexing biomedical publications. Query: (Step #1) AND (Step #2 OR Step #3) AND (Step #4)
polygraf-ai/arxiv-acl-pubmed-hss-abstracts-filtered-full dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for Malleshaiah MK (2010):The scaffold protein Ste5 directly controls a switch-like mating decision in yeast. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Evolution has resulted in numerous innovations that allow organisms to increase their fitness by choosing particular mating partners, including secondary sexual characteristics, behavioural patterns, chemical attractants and corresponding sensory mechanisms. The haploid yeast Saccharomyces cerevisiae selects mating partners by interpreting the concentration gradient of pheromone secreted by potential mates through a network of mitogen-activated protein kinase (MAPK) signalling proteins. The mating decision in yeast is an all-or-none, or switch-like, response that allows cells to filter weak pheromone signals, thus avoiding inappropriate commitment to mating by responding only at or above critical concentrations when a mate is sufficiently close. The molecular mechanisms that govern the switch-like mating decision are poorly understood. Here we show that the switching mechanism arises from competition between the MAPK Fus3 and a phosphatase Ptc1 for control of the phosphorylation state of four sites on the scaffold protein Ste5. This competition results in a switch-like dissociation of Fus3 from Ste5 that is necessary to generate the switch-like mating response. Thus, the decision to mate is made at an early stage in the pheromone pathway and occurs rapidly, perhaps to prevent the loss of the potential mate to competitors. We argue that the architecture of the Fus3-Ste5-Ptc1 circuit generates a novel ultrasensitivity mechanism, which is robust to variations in the concentrations of these proteins. This robustness helps assure that mating can occur despite stochastic or genetic variation between individuals. The role of Ste5 as a direct modulator of a cell-fate decision expands the functional repertoire of scaffold proteins beyond providing specificity and efficiency of information processing. Similar mechanisms may govern cellular decisions in higher organisms and be disrupted in cancer.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. Methodology/Principal FindingsTo overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. Conclusion/SignificanceBy allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Material 1.