100+ datasets found
  1. Data from: The Porcine Translational Research Database

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). The Porcine Translational Research Database [Dataset]. https://catalog.data.gov/dataset/the-porcine-translational-research-database-3e5c0
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The data in the Porcine Translational Research Database is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for >2,000 porcine genes. This database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene database. Resources in this dataset:Resource Title: The Porcine Translational Research Database. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/diet-genomics-and-immunology-laboratory/docs/dgil-porcine-translational-research-database/

  2. 785 Million Language Translation Database for AI

    • kaggle.com
    zip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramakrishnan Lakshmanan (2023). 785 Million Language Translation Database for AI [Dataset]. https://www.kaggle.com/datasets/ramakrishnan1984/785-million-language-translation-database-ai-ml
    Explore at:
    zip(6504894854 bytes)Available download formats
    Dataset updated
    Aug 28, 2023
    Authors
    Ramakrishnan Lakshmanan
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Our groundbreaking translation dataset represents a monumental advancement in the field of natural language processing and machine translation. Comprising a staggering 785 million records, this corpus bridges language barriers by offering translations from English to an astonishing 548 languages. The dataset promises to be a cornerstone resource for researchers, engineers, and developers seeking to enhance their machine translation models, cross-lingual analysis, and linguistic investigations.

    Size of the dataset – 41GB(Uncompressed) and Compressed – 20GB

    Key Features:

    Scope and Scale: With a comprehensive collection of 785 million records, this dataset provides an unparalleled wealth of translated text. Each record consists of an English sentence paired with its translation in one of the 548 target languages, enabling multi-directional translation applications.

    Language Diversity: Encompassing translations into 548 languages, this dataset represents a diverse array of linguistic families, dialects, and scripts. From widely spoken languages to those with limited digital representation, the dataset bridges communication gaps on a global scale.

    Quality and Authenticity: The translations have been meticulously curated, verified, and cross-referenced to ensure high quality and authenticity. This attention to detail guarantees that the dataset is not only extensive but also reliable, serving as a solid foundation for machine learning applications. Data is collected from various open datasets for my personal ML projects and looking to share it to team.

    Use Case Versatility: Researchers and practitioners across a spectrum of domains can harness this dataset for a myriad of applications. It facilitates the training and evaluation of machine translation models, empowers cross-lingual sentiment analysis, aids in linguistic typology studies, and supports cultural and sociolinguistic investigations.

    Machine Learning Advancement: Machine translation models, especially neural machine translation (NMT) systems, can leverage this dataset to enhance their training. The large-scale nature of the dataset allows for more robust and contextually accurate translation outputs.

    Fine-tuning and Customization: Developers can fine-tune translation models using specific language pairs, offering a powerful tool for specialized translation tasks. This customization capability ensures that the dataset is adaptable to various industries and use cases.

    Data Format: The dataset is provided in a structured json format, facilitating easy integration into existing machine learning pipelines. This structured approach expedites research and experimentation. Json format contains the English word and equivalent word as single record. Data was exported from MongoDB database to ensure the uniqueness of the record. Each of the record is unique and sorted.

    Access: The dataset is available for academic and research purposes, enabling the global AI community to contribute to and benefit from its usage. A well-documented API and sample code are provided to expedite exploration and integration.

    The English-to-548-languages translation dataset represents an incredible leap forward in advancing multilingual communication, breaking down barriers to understanding, and fostering collaboration on a global scale. It holds the potential to reshape how we approach cross-lingual communication, linguistic studies, and the development of cutting-edge translation technologies.

    Dataset Composition: The dataset is a culmination of translations from English, a widely spoken and understood language, into 548 distinct languages. Each language represents a unique linguistic and cultural background, providing a rich array of translation contexts. This diverse range of languages spans across various language families, regions, and linguistic complexities, making the dataset a comprehensive repository for linguistic research.

    Data Volume and Scale: With a staggering 785 million records, the dataset boasts an immense scale that captures a vast array of translations and linguistic nuances. Each translation entry consists of an English source text paired with its corresponding translation in one of the 548 target languages. This vast corpus allows researchers and practitioners to explore patterns, trends, and variations across languages, enabling the development of robust and adaptable translation models.

    Linguistic Coverage: The dataset covers an extensive set of languages, including but not limited to Indo-European, Afroasiatic, Sino-Tibetan, Austronesian, Niger-Congo, and many more. This broad linguistic coverage ensures that languages with varying levels of grammatical complexity, vocabulary richness, and syntactic structures are included, enhancing the applicability of translation models across diverse linguistic landscapes.

    Dataset Preparation: The translation ...

  3. Data from: Protein Post Translational Modifications

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Protein Post Translational Modifications [Dataset]. https://www.johnsnowlabs.com/marketplace/protein-post-translational-modifications/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    N/A
    Description

    This dataset includes protein post-translational modifications as well as associated annotation data obtained from the Biological General Repository for Interaction databases (BIOGRID) for major model organisms species including the type of modification, protein sequence and specific amino acid involved.

  4. d

    ATID: Alternative Translational Initiation Database

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). ATID: Alternative Translational Initiation Database [Dataset]. http://identifiers.org/RRID:SCR_009432
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A database of publicly available genes, alternatively translational isoforms and their detailed annotation. Alternative translational initiation is one of mechanisms to increase the complexity level of an organism by alternative gene expression pathways. The use of alternative translation initiation codons in a singe mRNA contributes to the generation of protein diversity. The genes produce two or more versions of the encoded proteins, and the shorter version, initiated from a downstream in-frame start codon, lacks the N-terminal amino acids fragment of the full-length isoform version. Since the first discovery of alternative translation initiation, a small, yet growing, number of mRNAs initiating translation from alternative start codons have been reported. Various studies began to emerge focusing on this new field in gene expression and revealed the biological significance of the use of alternative initiation. In response to the need for systematic studies on genes involving alternative translational initiation, Alternative Translational Initiation Database(ATID) is established to provide data of publicly available genes, alternatively translational isoforms and their detailed annotation.

  5. f

    Additional file 4: of The porcine translational research database: a...

    • datasetcatalog.nlm.nih.gov
    • springernature.figshare.com
    Updated Aug 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban, Joseph; Dawson, Harry; Chen, Celine; Shao, Jonathan; Gaynor, Brady (2017). Additional file 4: of The porcine translational research database: a manually curated, genomics and proteomics-based research resource [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001790915
    Explore at:
    Dataset updated
    Aug 23, 2017
    Authors
    Urban, Joseph; Dawson, Harry; Chen, Celine; Shao, Jonathan; Gaynor, Brady
    Description

    5′, ORF and 3′ end comparison of porcine and human mRNAs. 5′, ORF and 3′ end comparison of porcine and human mRNAs (XLSX 66 kb)

  6. Additional file 1: of The porcine translational research database: a...

    • springernature.figshare.com
    • figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Dawson; Celine Chen; Brady Gaynor; Jonathan Shao; Joseph Urban (2023). Additional file 1: of The porcine translational research database: a manually curated, genomics and proteomics-based research resource [Dataset]. http://doi.org/10.6084/m9.figshare.c.3860554_D1.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Harry Dawson; Celine Chen; Brady Gaynor; Jonathan Shao; Joseph Urban
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Porcine genes missing in Ensembl build 10.2 of the porcine genome. Gene names and evidence/source for RNA sequence of genes that are missing from Ensembl build 10.2. (XLSX 112 kb)

  7. f

    Global Identification of Protein Post-translational Modifications in a...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Dec 17, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keller, Mark P.; Frey, Brian L.; Shortreed, Michael R.; Scalf, Mark; Wenger, Craig D.; Attie, Alan D.; Sheynkman, Gloria M.; Smith, Lloyd M. (2015). Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001934451
    Explore at:
    Dataset updated
    Dec 17, 2015
    Authors
    Keller, Mark P.; Frey, Brian L.; Shortreed, Michael R.; Scalf, Mark; Wenger, Craig D.; Attie, Alan D.; Sheynkman, Gloria M.; Smith, Lloyd M.
    Description

    Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.

  8. r

    RECODE- The database of the translational recoding events

    • rrid.site
    • scicrunch.org
    Updated Oct 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). RECODE- The database of the translational recoding events [Dataset]. http://identifiers.org/RRID:SCR_007887/resolver?q=*&i=rrid
    Explore at:
    Dataset updated
    Oct 26, 2025
    Description

    A compilation of programmed; translational recoding events taken from the scientific literature and personal communications. The database deals with programmed ribosomal frameshifting, codon redefinition and translational bypass occurring in a variety of organisms. The entries for each event include the sequences of the corresponding genes, their encoded proteins for both the normal and alternate decoding, the types of the recoding events involved, trans-factors and cis-elements that influence recoding.

  9. Z

    PhenCards v.1.0.0 database

    • data.niaid.nih.gov
    Updated May 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Havrilla, James; Liu, Cong; Dong, Xiangchen; Weng, Chunhua; Wang, Kai (2021). PhenCards v.1.0.0 database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4755958
    Explore at:
    Dataset updated
    May 13, 2021
    Dataset provided by
    Columbia University
    Children's Hospital of Philadelphia
    Authors
    Havrilla, James; Liu, Cong; Dong, Xiangchen; Weng, Chunhua; Wang, Kai
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    A db-release of PhenCards to coincide with the release of the paper

    This is a citable repo with a zip file of everything used to make the Elasticsearch Lucene index database for PhenCards v1.0.0.

    This includes all data, like HPO, ICD, UMLS (without restricted sources), IRS data, Open990 data. Because of Open990, we cannot make it commercially available, but it is still fully open source for academics.

    However, we also provide the code for preprocessing of the data for Lucene indexing.

  10. f

    Big Data and artificial intelligence for translational research in COVID-19:...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Jan 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Mello, Nicole Freitas; da Silva, Everton Nunes; Ramos, Maíra Catharina; Shimizu, Helena Eri; Gomes, Dalila Fernandes; Barreto, Jorge Otávio Maia (2023). Big Data and artificial intelligence for translational research in COVID-19: a rapid review [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000996672
    Explore at:
    Dataset updated
    Jan 7, 2023
    Authors
    de Mello, Nicole Freitas; da Silva, Everton Nunes; Ramos, Maíra Catharina; Shimizu, Helena Eri; Gomes, Dalila Fernandes; Barreto, Jorge Otávio Maia
    Description

    ABSTRACT The objective of this study was to identify how Artificial Intelligence (AI) has been used for translational research in the context of COVID-19. A rapid review was carried out to identify the use of AI techniques in the translation of technologies to face COVID-19. A search strategy was used based on MeSH terms and their respective synonyms in seven databases. Of the 59 articles identified, eight were included. We identified 11 experiments that used AI for translational research in Covid-19: prediction of drug efficacy; predicting the pathogenicity of SARS-CoV-2; imaging diagnosis for COVID-19; predicting the incidence of COVID-19; estimates of the impact of COVID-19 on society; automation of sanitizing hospital and clinical environments; screening of infected and possibly infected people; monitoring the use of masks; prediction of patient severity; patient risk stratification; and prediction of hospital resources. Translational research can help in productive and industrial development in health, especially when supported by AI methods, an increasingly important tool, especially when discussing the Fourth Industrial Revolution and its applications in health.

  11. n

    Translational Research in Neuroimaging and Data Science datasets

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jun 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Translational Research in Neuroimaging and Data Science datasets [Dataset]. http://identifiers.org/RRID:SCR_021013
    Explore at:
    Dataset updated
    Jun 14, 2021
    Description

    Neuroimaging datasets available from TReNDs including resting state MRI.

  12. Advancing translational research in environmental science: The role and...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Advancing translational research in environmental science: The role and impact of social science [Dataset]. https://catalog.data.gov/dataset/advancing-translational-research-in-environmental-science-the-role-and-impact-of-social-sc
    Explore at:
    Dataset updated
    Apr 12, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Our dataset are transcripts and codebooks for a focus group study. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. EPA cannot release CBI, or data protected by copyright, patent, or otherwise subject to trade secret restrictions. Request for access to CBI data may be directed to the dataset owner by an authorized person by contacting the party listed. It can be accessed through the following means: Contact Katie Williams, williams.kathleen@epa.gov. Format: The data are transcripts and protected by IRB approvals. This dataset is associated with the following publication: Eisenhauer, E., K. Williams, K. Margeson, S. Paczuski, K. Mulvaney, and M.C. Hano. Advancing translational research in environmental science: The role and impact of social science. Environmental Science & Policy. Elsevier Science Ltd, New York, NY, USA, 120: 165-172, (2021).

  13. n

    FURTHeR

    • neuinfo.org
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). FURTHeR [Dataset]. http://identifiers.org/RRID:SCR_006383
    Explore at:
    Dataset updated
    Sep 8, 2024
    Description

    Data and knowledge management infrastructure for the new Center for Clinical and Translational Science (CCTS) at the University of Utah. This clinical cohort search tool is used to search across the University of Utah clinical data warehouse and the Utah Population Database for people who satisfy various criteria of the researchers. It uses the i2b2 front end but has a set of terminology servers, metadata servers and federated query tool as the back end systems. FURTHeR does on-the-fly translation of search terms and data models across the source systems and returns a count of results by unique individuals. They are extending the set of databases that can be queried.

  14. Additional file 2: of The porcine translational research database: a...

    • figshare.com
    • springernature.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Dawson; Celine Chen; Brady Gaynor; Jonathan Shao; Joseph Urban (2023). Additional file 2: of The porcine translational research database: a manually curated, genomics and proteomics-based research resource [Dataset]. http://doi.org/10.6084/m9.figshare.c.3860554_D2.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Harry Dawson; Celine Chen; Brady Gaynor; Jonathan Shao; Joseph Urban
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Artifactually duplicated genes in Ensembl build 10.2. Gene names, Ensembl and NCBI loci numbers and NCBI genome build 10.2 coordinates of artifactually duplicated genes (XLSX 282 kb)

  15. f

    Data from: Predicting translational progress in biomedical research

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Oct 10, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meseroll, Rebecca A.; Santangelo, George M.; Davis, Matthew T.; Hutchins, B. Ian (2019). Predicting translational progress in biomedical research [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000179930
    Explore at:
    Dataset updated
    Oct 10, 2019
    Authors
    Meseroll, Rebecca A.; Santangelo, George M.; Davis, Matthew T.; Hutchins, B. Ian
    Description

    Fundamental scientific advances can take decades to translate into improvements in human health. Shortening this interval would increase the rate at which scientific discoveries lead to successful treatment of human disease. One way to accomplish this would be to identify which advances in knowledge are most likely to translate into clinical research. Toward that end, we built a machine learning system that detects whether a paper is likely to be cited by a future clinical trial or guideline. Despite the noisiness of citation dynamics, as little as 2 years of postpublication data yield accurate predictions about a paper’s eventual citation by a clinical article (accuracy = 84%, F1 score = 0.56; compared to 19% accuracy by chance). We found that distinct knowledge flow trajectories are linked to papers that either succeed or fail to influence clinical research. Translational progress in biomedicine can therefore be assessed and predicted in real time based on information conveyed by the scientific community’s early reaction to a paper.

  16. r

    i2b2 Research Data Warehouse

    • rrid.site
    • scicrunch.org
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). i2b2 Research Data Warehouse [Dataset]. http://identifiers.org/RRID:SCR_013276
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    A data warehouse that integrates information on patients from multiple sources and consists of patient information from all the visits to Cincinnati Children''''s between 2003 and 2007. This information includes demographics (age, gender, race), diagnoses (ICD-9), procedures, medications and lab results. They have included extracts from Epic, DocSite, and the new Cerner laboratory system and will eventually load public data sources, data from the different divisions or research cores (such as images or genetic data), as well as the research databases from individual groups or investigators. This information is aggregated, cleaned and de-identified. Once this process is complete, it is presented to the user, who will then be able to query the data. The warehouse is best suited for tasks like cohort identification, hypothesis generation and retrospective data analysis. Automated software tools will facilitate some of these functions, while others will require more of a manual process. The initial software tools will be focused around cohort identification. They have developed a set of web-based tools that allow the user to query the warehouse after logging in. The only people able to see your data are those to whom you grant authorization. If the information can be provided to the general research community, they will add it to the warehouse. If it cannot, they will mark it so that only you (or others in your group with proper approval) can access it.

  17. e

    Translational Research And Innovations Private Limited Export Import Data |...

    • eximpedia.app
    Updated Feb 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Translational Research And Innovations Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/translational-research-and-innovations-private-limited/43327089
    Explore at:
    Dataset updated
    Feb 9, 2025
    Description

    Translational Research And Innovations Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  18. g

    Curated dataset on protein's properties and post-translational modification...

    • nanocommons.github.io
    • data.niaid.nih.gov
    • +1more
    Updated Aug 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NanoSolveIT (2023). Curated dataset on protein's properties and post-translational modification protein properties [Dataset]. http://doi.org/10.5281/zenodo.8314626
    Explore at:
    Dataset updated
    Aug 31, 2023
    Dataset authored and provided by
    NanoSolveIT
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Proteins perform essential cellular functions, which range from cell division and metabolism to DNA replication. Thus, decoding the mechanism of action of cells, requires understanding of the functioning and physicochemical properties of proteins [1]. While the genetic code encodes the primary structure of proteins, they undergo various modifications as part of their normal functioning including addition of modifying groups, such as acetyl, phosphoryl, glycosyl, and methyl, to one or more amino acids after translation, which is known as post-translational modification (PTM) [2, 3]. PTMs play an essential role in regulating protein functions by altering their physicochemical properties and understanding these reactions provides valuable insights regarding cell function. Advances in proteomics research have significantly deepened our understanding of PTMs and their impact on cellular functions and disease mechanisms. The study of PTMs is now at the forefront of research in molecular biology and biochemistry. Many databases, software, and tools have been developed to enhance our understanding of the various PTMs that affect human plasma proteins and help to simplify the analysis of complex PTM data [4]. These PTM databases and tools contain significant information and are a valuable resource for the research community. Key databases include dbPTM, UniProt, and PubChem. Utilising these databases, protein-related information like substrate peptides, amino acid sequence numbers, and experimentally validated PTM sites can be identified and curated. This dataset presents curated information regarding PTM-related changes in the physicochemical properties of the 16 most abundant plasma proteins [5], i.e., Serum Albumin, Serotransferrin, Antithrombin-III, Apolipoprotein A-I, Apolipoprotein A-IV, Apolipoprotein B-100, Apolipoprotein C-II, Apolipoprotein C-III, Apolipoprotein E, Clusterin, Complement C3, Haptoglobin, Histidine-rich glycoprotein, Mannose-binding protein C, Hemoglobin, and Fibrinogen alpha chain. The physicochemical properties studied, and the impact of different PTMs on the properties, include the protein molecular weight, isoelectric point, surface hydrophobicity, and solubility. The PTMs explored include phosphorylation, acetylation, glycosylation, methylation, ubiquitination, SUMOylation, lipidation, glutathionylation, nitrosylation, sulfoxidation, succinylation, neddylation, malonylation, hydroxylation, oxidation, and palmitoylation. References Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002. Analyzing Protein Structure and Function. Chen, H.; Venkat, S.; McGuire, P.; Gan, Q.; Fan, C. Recent Development of Genetic Code Expansion for Posttranslational Modification Studies. Molecules 2018, 23, 1662. Marc Oeller, Ryan Kang, Hannah Bolt, Ana Gomes dos Santos, Annika Langborg Weinmann, Antonios Nikitidis, Pavol Zlatoidsky, Wu Su, Werngard Czechtizky, Leonardo De Maria,Pietro Sormanni, Michele Vendruscolo: Sequence-based prediction of the solubility of peptides containing non-natural amino acids [bioRiv]. Ramazi S, Zahiri J. Posttranslational modifications in proteins: resources, tools and prediction methods. Database (Oxford). 2021 Apr 7;2021:baab012.

  19. f

    Table1_Preclinical species gene expression database: Development and...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G. (2023). Table1_Preclinical species gene expression database: Development and meta-analysis.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001025224
    Explore at:
    Dataset updated
    Jan 17, 2023
    Authors
    Vo, Andy; Krause, Caitlin; Liguori, Michael J.; Kowalkowski, Kenneth; Van Vleet, Terry R.; Suwada, Kinga; Mittelstadt, Scott; Rendino, Lauren; Mahalingaiah, Prathap Kumar; Peterson, Richard; Blomme, Eric A. G.
    Description

    The evaluation of toxicity in preclinical species is important for identifying potential safety liabilities of experimental medicines. Toxicology studies provide translational insight into potential adverse clinical findings, but data interpretation may be limited due to our understanding of cross-species biological differences. With the recent technological advances in sequencing and analyzing omics data, gene expression data can be used to predict cross species biological differences and improve experimental design and toxicology data interpretation. However, interpreting the translational significance of toxicogenomics analyses can pose a challenge due to the lack of comprehensive preclinical gene expression datasets. In this work, we performed RNA-sequencing across four preclinical species/strains widely used for safety assessment (CD1 mouse, Sprague Dawley rat, Beagle dog, and Cynomolgus monkey) in ∼50 relevant tissues/organs to establish a comprehensive preclinical gene expression body atlas for both males and females. In addition, we performed a meta-analysis across the large dataset to highlight species and tissue differences that may be relevant for drug safety analyses. Further, we made these databases available to the scientific community. This multi-species, tissue-, and sex-specific transcriptomic database should serve as a valuable resource to enable informed safety decision-making not only during drug development, but also in a variety of disciplines that use these preclinical species.

  20. e

    Data from: MGVB: a new proteomics toolset for fast and efficient data...

    • ebi.ac.uk
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Metodi Metodiev (2024). MGVB: a new proteomics toolset for fast and efficient data analysis [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD051331
    Explore at:
    Dataset updated
    Nov 15, 2024
    Authors
    Metodi Metodiev
    Variables measured
    Proteomics
    Description

    MGVB is a collection of tools for proteomics data analysis. It covers data processing from in silico digestion of protein sequences to comprehensive identification of postranslational modifications and solving the protein inference problem. The toolset is developed with efficiency in mind. It enables analysis at a fraction of the resources cost typically required by existing commercial and free tools. MGVB, as it is a native application, is much faster than existing proteomics tools such as MaxQuant and MSFragger and, in the same time, finds very similar, in some cases even larger number of peptides at a chosen level of statistical significance. It implements a probabilistic scoring function to match spectra to sequences, and a novel combinatorial search strategy for finding post-translational modifications, and a Bayesian approach to locate modification sites. This report describes the algorithms behind the tools, presents benchmarking data sets analysis comparing MGVB performance to MaxQuant/Andromeda, and provides step by step instructions for using it in typical analytical scenarios. The toolset is provided free to download and use for academic research and in software projects, but is not open source at the present. It is the intention of the author that it will be made open source in the near future—following rigorous evaluations and feedback from the proteomics research community.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). The Porcine Translational Research Database [Dataset]. https://catalog.data.gov/dataset/the-porcine-translational-research-database-3e5c0
Organization logo

Data from: The Porcine Translational Research Database

Related Article
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The data in the Porcine Translational Research Database is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for >2,000 porcine genes. This database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene database. Resources in this dataset:Resource Title: The Porcine Translational Research Database. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/diet-genomics-and-immunology-laboratory/docs/dgil-porcine-translational-research-database/

Search
Clear search
Close search
Google apps
Main menu