The data in the Porcine Translational Research Database is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for >2,000 porcine genes. This database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene database. Resources in this dataset:Resource Title: The Porcine Translational Research Database. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/diet-genomics-and-immunology-laboratory/docs/dgil-porcine-translational-research-database/
The data in the Porcine Translational Research Database is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for >2,000 porcine genes. This database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene database. Resources in this dataset:Resource Title: The Porcine Translational Research Database. File Name: Web Page, url: https://res1wwwd-o-tarsd-o-tusdad-o-tgov.vcapture.xyz/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/diet-genomics-and-immunology-laboratory/docs/dgil-porcine-translational-research-database/
This dataset includes protein post-translational modifications as well as associated annotation data obtained from the Biological General Repository for Interaction databases (BIOGRID) for major model organisms species including the type of modification, protein sequence and specific amino acid involved.
A database of publicly available genes, alternatively translational isoforms and their detailed annotation. Alternative translational initiation is one of mechanisms to increase the complexity level of an organism by alternative gene expression pathways. The use of alternative translation initiation codons in a singe mRNA contributes to the generation of protein diversity. The genes produce two or more versions of the encoded proteins, and the shorter version, initiated from a downstream in-frame start codon, lacks the N-terminal amino acids fragment of the full-length isoform version. Since the first discovery of alternative translation initiation, a small, yet growing, number of mRNAs initiating translation from alternative start codons have been reported. Various studies began to emerge focusing on this new field in gene expression and revealed the biological significance of the use of alternative initiation. In response to the need for systematic studies on genes involving alternative translational initiation, Alternative Translational Initiation Database(ATID) is established to provide data of publicly available genes, alternatively translational isoforms and their detailed annotation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Artifactually duplicated genes in Ensembl build 10.2. Gene names, Ensembl and NCBI loci numbers and NCBI genome build 10.2 coordinates of artifactually duplicated genes (XLSX 282 kb)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By yanis labrak (From Huggingface) [source]
The train.csv file in the Medical Translation Dataset serves the purpose of providing a comprehensive collection of accurate and reliable medical translation data. It has been meticulously curated to ensure that it meets the highest standards of quality, making it an invaluable resource for medical professionals, researchers, and language experts alike.
This dataset is specifically designed to offer a rich variety of translations in the medical field, encompassing a wide range of topics such as diagnoses, treatment plans, clinical research findings, pharmaceutical information, and more. The translations contained herein cover various languages spoken worldwide, allowing for cross-cultural comparisons and analysis.
Every effort has been made to ensure the accuracy and precision of each translation within this dataset. Professional translators with specialized knowledge in the medical domain have meticulously crafted these translations to maintain their authenticity and fidelity to the original source text.
Researchers can utilize this extensive dataset for numerous purposes such as training machine learning models aimed at automating medical translation processes or conducting comprehensive linguistic analyses on specific medical terminologies across different languages. Moreover, healthcare providers can leverage this dataset to enhance communication with patients who speak different languages or facilitate accurate transfer of vital medical information across borders.
By utilizing this comprehensive collection of accurately translated texts, users can benefit from improved understanding and communication within the healthcare sector globally. It enables greater accessibility to important medical information regardless of language barriers while ensuring that essential details are conveyed precisely during critical moments related to patient care.
In conclusion, this train.csv file is an invaluable resource that provides accurate and reliable medical translation data catering to various languages spoken worldwide. Its meticulous curation process ensures that it meets high-quality standards for enhancing global healthcare communications while promoting inclusivity and effective dissemination of crucial medical knowledge among diverse populations
The train.csv file in the dataset is specifically designed to offer precise medical translation data. This dataset contains accurate translations related to medical topics.
Description of the Dataset
Columns: - translation: This column contains the original text in a particular language that requires translation. - **translation: This column contains the translated text in another language.
To better understand and utilize this dataset, it would be helpful to provide a comprehensive guide on How to use this dataset. However, please ensure that the guide does not include any specific dates or date-related information
- Natural Language Processing (NLP) Research: This dataset can be used for training and evaluating NLP models specifically designed for medical translation tasks. Researchers can develop new algorithms, models, and techniques to improve the accuracy and efficiency of medical translation.
- Machine Learning in Healthcare: The dataset can be utilized to train machine learning algorithms in order to automatically translate medical documents or text from one language to another. This could help in speeding up the translation process and providing healthcare professionals with timely access to essential information.
- Development of Medical Translation Applications: The dataset's accurate medical translations can be leveraged for creating mobile or web-based applications that offer instant translation services for healthcare providers, patients, or individuals seeking reliable translations of medical content. By utilizing this dataset creatively, it is possible to enhance the quality of medical translations, improve patient care, and facilitate global collaboration in healthcare research and practices
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permissio...
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
A db-release of PhenCards to coincide with the release of the paper
This is a citable repo with a zip file of everything used to make the Elasticsearch Lucene index database for PhenCards v1.0.0.
This includes all data, like HPO, ICD, UMLS (without restricted sources), IRS data, Open990 data. Because of Open990, we cannot make it commercially available, but it is still fully open source for academics.
However, we also provide the code for preprocessing of the data for Lucene indexing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Porcine or artiodactyl-specific paralogs. Gene names, Ensembl and NCBI loci numbers and Build 10.2 NCBI gene coordinates of porcine or artiodactyl-specific paralogs (XLSX 58 kb)
A compilation of programmed; translational recoding events taken from the scientific literature and personal communications. The database deals with programmed ribosomal frameshifting, codon redefinition and translational bypass occurring in a variety of organisms. The entries for each event include the sequences of the corresponding genes, their encoded proteins for both the normal and alternate decoding, the types of the recoding events involved, trans-factors and cis-elements that influence recoding.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
5′, ORF and 3′ end comparison of porcine and human mRNAs. 5′, ORF and 3′ end comparison of porcine and human mRNAs (XLSX 66 kb)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proteins perform essential cellular functions, which range from cell division and metabolism to DNA replication. Thus, decoding the mechanism of action of cells, requires understanding of the functioning and physicochemical properties of proteins [1]. While the genetic code encodes the primary structure of proteins, they undergo various modifications as part of their normal functioning including addition of modifying groups, such as acetyl, phosphoryl, glycosyl, and methyl, to one or more amino acids after translation, which is known as post-translational modification (PTM) [2, 3]. PTMs play an essential role in regulating protein functions by altering their physicochemical properties and understanding these reactions provides valuable insights regarding cell function. Advances in proteomics research have significantly deepened our understanding of PTMs and their impact on cellular functions and disease mechanisms. The study of PTMs is now at the forefront of research in molecular biology and biochemistry.
Many databases, software, and tools have been developed to enhance our understanding of the various PTMs that affect human plasma proteins and help to simplify the analysis of complex PTM data [4]. These PTM databases and tools contain significant information and are a valuable resource for the research community. Key databases include dbPTM, UniProt, and PubChem. Utilising these databases, protein-related information like substrate peptides, amino acid sequence numbers, and experimentally validated PTM sites can be identified and curated.
This dataset presents curated information regarding PTM-related changes in the physicochemical properties of the 16 most abundant plasma proteins [5], i.e., Serum Albumin, Serotransferrin, Antithrombin-III, Apolipoprotein A-I, Apolipoprotein A-IV, Apolipoprotein B-100, Apolipoprotein C-II, Apolipoprotein C-III, Apolipoprotein E, Clusterin, Complement C3, Haptoglobin, Histidine-rich glycoprotein, Mannose-binding protein C, Hemoglobin, and Fibrinogen alpha chain. The physicochemical properties studied, and the impact of different PTMs on the properties, include the protein molecular weight, isoelectric point, surface hydrophobicity, and solubility. The PTMs explored include phosphorylation, acetylation, glycosylation, methylation, ubiquitination, SUMOylation, lipidation, glutathionylation, nitrosylation, sulfoxidation, succinylation, neddylation, malonylation, hydroxylation, oxidation, and palmitoylation.
References
Data and knowledge management infrastructure for the new Center for Clinical and Translational Science (CCTS) at the University of Utah. This clinical cohort search tool is used to search across the University of Utah clinical data warehouse and the Utah Population Database for people who satisfy various criteria of the researchers. It uses the i2b2 front end but has a set of terminology servers, metadata servers and federated query tool as the back end systems. FURTHeR does on-the-fly translation of search terms and data models across the source systems and returns a count of results by unique individuals. They are extending the set of databases that can be queried.
Our dataset are transcripts and codebooks for a focus group study. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. EPA cannot release CBI, or data protected by copyright, patent, or otherwise subject to trade secret restrictions. Request for access to CBI data may be directed to the dataset owner by an authorized person by contacting the party listed. It can be accessed through the following means: Contact Katie Williams, williams.kathleen@epa.gov. Format: The data are transcripts and protected by IRB approvals. This dataset is associated with the following publication: Eisenhauer, E., K. Williams, K. Margeson, S. Paczuski, K. Mulvaney, and M.C. Hano. Advancing translational research in environmental science: The role and impact of social science. Environmental Science & Policy. Elsevier Science Ltd, New York, NY, USA, 120: 165-172, (2021).
Comprehensive dataset of 5,975 Translation services in United States as of August, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a very early experiment of Lojban machine translation. For a larger dataset, see https://huggingface.co/datasets/smuske/Korpora
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
After translation, many newly formed proteins undergo further covalent modifications that alter their functional properties. Modifications associated with protein localization include the attachment of oligosaccharide moieties to membrane-bound and secreted proteins (N-linked and O-linked glycosylation), the attachment of lipid (RAB geranylgeranylation) or glycolipid moieties (GPI-anchored proteins) that anchor proteins to cellular membranes, and the vitamin K-dependent attachment of carboxyl groups to glutamate residues. Modifications associated with functions of specific proteins include gamma carboxylation of clotting factors, hypusine formation on eukaryotic translation initiation factor 5A, conversion of a cysteine residue to formylglycine (arylsulfatase activation), methylation of lysine and arginine residues on non-histone proteins (protein methylation), protein phosphorylation by secretory pathway kinases, and carboxyterminal modifications of tubulin involving the addition of polyglutamate chains.
Protein ubiquitination and deubiquitination play a major role in regulating protein stability and, together with SUMOylation and neddylation, can modulate protein function as well.
Neuroimaging datasets available from TReNDs including resting state MRI.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Porcine genes missing in Ensembl build 10.2 of the porcine genome. Gene names and evidence/source for RNA sequence of genes that are missing from Ensembl build 10.2. (XLSX 112 kb)
These data are associated with the publication: Translational Science Education Through Citizen Science.
The data in the Porcine Translational Research Database is supported by >5800 references, and contains 65 data fields for each entry, including >9700 full length (5′ and 3′) unambiguous pig sequences, >2400 real time PCR assays and reactivity information on >1700 antibodies. It also contains gene and/or protein expression data for >2200 genes and identifies and corrects errors (gene duplications artifacts, mis-assemblies, mis-annotations, and incorrect species assignments) for >2,000 porcine genes. This database is the largest manually curated database for any single veterinary species and is unique among porcine gene databases in regard to linking gene expression to gene function, identifying related gene pathways, and connecting data with other porcine gene database. Resources in this dataset:Resource Title: The Porcine Translational Research Database. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md/beltsville-human-nutrition-research-center/diet-genomics-and-immunology-laboratory/docs/dgil-porcine-translational-research-database/