100+ datasets found

b
Codon Usage Tabulated from GenBank
bioregistry.io
Updated Nov 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Codon Usage Tabulated from GenBank [Dataset]. https://bioregistry.io/cutg
Explore at:
Dataset updated
Nov 10, 2022
Description
Codon usage in individual genes has been calculated using the nucleotide sequence data obtained from the GenBank Genetic Sequence Database. The compilation of codon usage is synchronized with each major release of GenBank.
Z
Codon similarity data in ATTED-II ver 8.0 (Ath, Gma, Osa, Sly, Vvi)
data.niaid.nih.gov
Updated Jul 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Obayashi, Takeshi; Aoki, Yuichi (2023). Codon similarity data in ATTED-II ver 8.0 (Ath, Gma, Osa, Sly, Vvi) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8123039
Explore at:
Dataset updated
Jul 10, 2023
Dataset provided by
Tohoku University
Authors
Obayashi, Takeshi; Aoki, Yuichi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Codon similarity data in ATTED-II ver 8.0

The gene-to-gene codon similarity data is organized in the form of tables, each named according to the Entrez Gene ID of a particular query gene. Each table encompasses three columns, specifying: the Entrez Gene ID of a corresponding gene, an MR (Mutual Rank) value (where a smaller number signifies a stronger relationship), and a Pearson correlation coefficient (where a larger number suggests a stronger association).

Protein-coding sequences utilized in this study were retrieved from NCBI's RefSeq database. For each gene, a 61-dimensional vector was derived from the count of codons in the protein-coding sequence. In instances where multiple RefSeq sequences were associated with a single gene, the longest sequence was selected for the codon usage calculation. Pearson correlation coefficients (PCCs) were determined between the vectors of any two given genes. These PCCs were subsequently converted into MRs, employed as an index to evaluate the similarity in codon usage between the genes.
Codon Usage - UCI
kaggle.com
zip
Updated Nov 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salik Hussaini (2021). Codon Usage - UCI [Dataset]. https://www.kaggle.com/salikhussaini49/codon-usage
Explore at:
zip(2035077 bytes)Available download formats
Dataset updated
Nov 25, 2021
Authors
Salik Hussaini
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

We examined codon usage frequencies in the genomic coding DNA of a large sample of diverse organisms from different taxa tabulated in the CUTG database, where we further manually curated and harmonized these existing entries by re-classifying CUTG's bacteria (bct) class into archaea (arc), plasmids (plm), and bacteria proper (keeping with the original label bct'). The reclassification in the originalbct' domain was simplified by extracting from files qbxxx.spsum.txt' (where xxx = bct (bacteria), inv (invertebrates), mam (mammals), pln (plants), pri (primates), rod (rodents), vrt (vertebrates)) the different genus names of the entries, and making the classification by genus. There were 514 different genus names. The different genus categories were checked and relabeled asarc' where appropriate. In the eubacterial entries, the distinction was made of the bacterial genomes proper (keeping with the original label bct'), and bacterial plasmids (now labeledplm').

Content

Column 1: Kingdom Column 2: DNAtype Column 3: SpeciesID Column 4: Ncodons Column 5: SpeciesName Columns 6-69: codon (header: nucleotide bases; entries: frequency of usage (5 digit floating point number))

The 'Kingdom' is a 3-letter code corresponding to `xxx' in the CUTG database name: 'arc'(archaea), 'bct'(bacteria), 'phg'(bacteriophage), 'plm' (plasmid), 'pln' (plant), 'inv' (invertebrate), 'vrt' (vertebrate), 'mam' (mammal), 'rod' (rodent), 'pri' (primate), and 'vrl'(virus) sequence entries. Note that the CUTG database does not contain 'arc' and 'plm' (these have been manually curated ourselves).

The 'DNAtype' is denoted as an integer for the genomic composition in the species: 0-genomic, 1-mitochondrial, 2-chloroplast, 3-cyanelle, 4-plastid, 5-nucleomorph, 6-secondary_endosymbiont, 7-chromoplast, 8-leucoplast, 9-NA, 10-proplastid, 11-apicoplast, and 12-kinetoplast.

The species identifier ('SpeciesID') is an integer, which uniquely indicates the entries of an organism. It is an accession identifier for each different species in the original CUTG database, followed by the first item listed in each genome.

The number of codons (`Ncodons') is the algebraic sum of the numbers listed for the different codons in an entry of CUTG. Codon frequencies are normalized to the total codon count, hence the number of occurrences divided by 'Ncodons' is the codon frequencies listed in the data file.

The species' name ('SpeciesName') is represented in strings purged of comma' (which are now replaced byspace'). This is a descriptive label of the name of the species for data interpretations.

Lastly, the codon frequencies ('codon') including 'UUU', 'UUA', 'UUG', 'CUU', etc., are recorded as floats (with decimals in 5 digits).

Acknowledgements

Khomtchouk BB: 'Codon usage bias levels predict taxonomic identity and genetic composition'. bioRxiv, 2020, doi: 10.1101/2020.10.26.356295.

Nakamura Y, Gojobori T, Ikemura T: 'Codon usage tabulated from international DNA sequence databases: status for the year 2000'. Nucleic Acids Research, 2000, 28:292.

Inspiration

Extend Biology Research.
f
Data from: tRic: a user-friendly data portal to explore the expression...
tandf.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhao Zhang; Hang Ruan; Chun-Jie Liu; Youqiong Ye; Jing Gong; Lixia Diao; An-Yuan Guo; Leng Han (2024). tRic: a user-friendly data portal to explore the expression landscape of tRNAs in human cancers [Dataset]. http://doi.org/10.6084/m9.figshare.9699140.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9699140.v2
Dataset updated
Feb 15, 2024
Dataset provided by
Taylor & Francis
Authors
Zhao Zhang; Hang Ruan; Chun-Jie Liu; Youqiong Ye; Jing Gong; Lixia Diao; An-Yuan Guo; Leng Han
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Transfer RNAs (tRNAs) play critical roles in human cancer. Currently, no database provides the expression landscape and clinical relevance of tRNAs across a variety of human cancers. Utilizing miRNA-seq data from The Cancer Genome Atlas, we quantified the relative expression of tRNA genes and merged them into the codon level and amino level across 31 cancer types. The expression of tRNAs is associated with clinical features of patient smoking history and overall survival, and disease stage, subtype, and grade. We further analysed codon frequency and amino acid frequency for each protein coding gene and linked alterations of tRNA expression with protein translational efficiency. We include these data resources in a user-friendly data portal, tRic (tRNA in cancer, https://hanlab.uth.edu/tRic/ or http://bioinfo.life.hust.edu.cn/tRic/), which can be of significant interest to the research community.
f
Codon bias and codon pair bias scores for FPR1 variants calculated based on...
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heini M. Miettinen (2023). Codon bias and codon pair bias scores for FPR1 variants calculated based on the codons for ten validated SNPs. [Dataset]. http://doi.org/10.1371/journal.pone.0028712.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0028712.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Heini M. Miettinen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Haplotype designations 1A-16A are by Sahagun-Ruiz et al.[2]. B, C and D show haplotypes in which the SNP does not change the amino acid compared to A [4]. The table includes the FPR1 SNPs in the following order: c.32C>T/p.T11I, c.140T>C/p.V47A, c.301G>C/p.V101L, c.306T>C/p.F102F, c.348C>T/p.I116I, c.546C>A/p.P182P, c.568A>T/p.R190W, c.576T>G>C/p.N192K, c.993C>T/p.T331T, c.1037C>A/p.A356E. The codon bias results show the differences between the various haplotypes based on the total of each SNP codon usage score, as obtained from the GenBank Homo sapiens Codon Usage Database (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=9606). The codon pair bias results show the differences between the various haplotypes based on the total of each SNP codon pair score, as calculated from the Supplemental Material by Coleman et al.www.sciencemag.org/cgi/content/full/320/5884/1784/DC1[6]. Amino acids are shown in single letter code. The nucleotide in the 3rd position of the synonymous codons is as shown.
r
Data on Stop Codon Usage in Bacteria and Its Correlation with Release Factor...
researchdata.se
data.europa.eu
Updated Jun 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gürkan Korkmaz (2025). Data on Stop Codon Usage in Bacteria and Its Correlation with Release Factor Abundance [Dataset]. http://doi.org/10.57804/83hw-m374
Explore at:
(77355124), (106760384)Available download formats
Unique identifier
https://doi.org/10.57804/83hw-m374
Dataset updated
Jun 26, 2025
Dataset provided by
Uppsala University
Authors
Gürkan Korkmaz
Description
We present a comprehensive analysis of stop codon usage in bacteria by analyzing over eight million coding sequences of 4684 bacterial sequences. Using a newly developed program called "stop codon counter," the frequencies of the three classical stop codons TAA, TAG, and TGA were analyzed, and a publicly available stop codon database was built.

Datafiles contain: 1) Complete data set of stop codon usage of all analyzed sequences as described in the publication "Comprehensive Analysis of Stop Codon Usage in Bacteria and Its Correlation with Release Factor Abundance" by Korkmaz et al (2014).

2) The Java program that was used for the analysis of the coding sequences. Execute the file in Program\ProjectStopCodonCounter\dist

The dataset was originally published in DiVA and moved to SND in 2024.
d
Data from: A simple model based on mutation and selection explains trends in...
catalog.data.gov
data.virginia.gov
+1more
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes [Dataset]. https://catalog.data.gov/dataset/a-simple-model-based-on-mutation-and-selection-explains-trends-in-codon-and-amino-acid-usa
Explore at:
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Background: Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. Results: Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. Conclusions: Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
RSCU values per gene in the yeast genomes
figshare.com
application/x-gzip
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abigail LaBella (2024). RSCU values per gene in the yeast genomes [Dataset]. http://doi.org/10.6084/m9.figshare.27074467.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27074467.v1
Dataset updated
Sep 20, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Abigail LaBella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Relative synonymous codon usage was calculated for all genomes in the Y1000+ database. The RSCU for orders with a CTG codon reassignment were computed taking the reassignment into account. The script used to generate the RSCU values can be found here: https://github.com/The-Lab-LaBella/RSCU_Calculation_AnalysisThe file contains the name of the assemblies, as listed in the supplemental data 1 of Opulente et al 2024. The columns contain the clade/order and codons analyzed.
d
Data from: Serine codon-usage bias in deep phylogenomics: pancrustacean...
search.dataone.org
data-staging.niaid.nih.gov
+2more
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Rota-Stabelli; Nicolas Lartillot; Herve Philippe; Davide Pisani (2025). Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study [Dataset]. http://doi.org/10.5061/dryad.7p1k8304
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.7p1k8304
Dataset updated
Jun 10, 2025
Dataset provided by
Dryad Digital Repository
Authors
Omar Rota-Stabelli; Nicolas Lartillot; Herve Philippe; Davide Pisani
Time period covered
Oct 12, 2012
Description
Phylogenomic analyses of ancient relationships are usually performed using amino acid data, but it is unclear whether amino acids or nucleotides should be preferred. With the 2-fold aim of addressing this problem and clarifying pancrustacean relationships, we explored the signals in the 62 protein-coding genes carefully assembled by Regier et al. in 2010. With reference to the pancrustaceans, this data set infers a highly supported nucleotide tree that is substantially different to the corresponding, but poorly supported, amino acid one. We show that the discrepancy between the nucleotide-based and the amino acids-based trees is caused by substitutions within synonymous codon families (especially those of serineâ€”TCN and AGY). We show that different arthropod lineages are differentially biased in their usage of serine, arginine, and leucine synonymous codons, and that the serine bias is correlated with the topology derived from the nucleotides, but not the amino acids. We suggest that a ...
Data files for downstream analysis of the “mammalian codon usage” manuscript...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konrad L M Rudolph; Bianca M Schmitt; Diego Villar; Robert J White; John C Marioni; Claudia Kutter; Duncan T Odom (2023). Data files for downstream analysis of the “mammalian codon usage” manuscript [Dataset]. http://doi.org/10.6084/m9.figshare.2056227.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2056227.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Konrad L M Rudolph; Bianca M Schmitt; Diego Villar; Robert J White; John C Marioni; Claudia Kutter; Duncan T Odom
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Snapshot of the data used to run downstream analysis. A detailed methods description can be found in the manuscript and the associated analysis code.
n
Codon and Codon-Pair Usage Tables
neuinfo.org
dknet.org
+2more
Updated Jan 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Codon and Codon-Pair Usage Tables [Dataset]. http://identifiers.org/RRID:SCR_018504
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_018504
Dataset updated
Jan 28, 2025
Description
Database includes genomic codon-pair and dinucleotide statistics of all organisms with sequenced genome. Facilitates genetic variation analyses and recombinant gene design. Derived from all available GenBank and RefSeq data.
t
BIOGRID CURATED DATA FOR PUBLICATION: Host adaptation of codon usage in...
thebiogrid.org
zip
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BioGRID Project (2022). BIOGRID CURATED DATA FOR PUBLICATION: Host adaptation of codon usage in SARS-CoV-2 from mammals indicates potential natural selection and viral fitness. [Dataset]. https://thebiogrid.org/244971/publication/host-adaptation-of-codon-usage-in-sars-cov-2-from-mammals-indicates-potential-natural-selection-and-viral-fitness.html
Explore at:
zipAvailable download formats
Dataset updated
Sep 28, 2022
Dataset authored and provided by
BioGRID Project
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Protein-Protein, Genetic, and Chemical Interactions for Fu Y (2022):Host adaptation of codon usage in SARS-CoV-2 from mammals indicates potential natural selection and viral fitness. curated by BioGRID (https://thebiogrid.org); ABSTRACT: SARS-CoV-2 infection, which is the cause of the COVID-19 pandemic, has expanded across various animal hosts, and the virus can be transmitted particularly efficiently in minks. It is still not clear how SARS-CoV-2 is selected and evolves in its hosts, or how mutations affect viral fitness. In this report, sequences of SARS-CoV-2 isolated from human and animal hosts were analyzed, and the binding energy and capacity of the spike protein to bind human ACE2 and the mink receptor were compared. Codon adaptation index (CAI) analysis indicated the optimization of viral codons in some animals such as bats and minks, and a neutrality plot demonstrated that natural selection had a greater influence on some SARS-CoV-2 sequences than mutational pressure. Molecular dynamics simulation results showed that the mutations Y453F and N501T in mink SARS-CoV-2 could enhance the binding of the viral spike to the mink receptor, indicating the involvement of these mutations in natural selection and viral fitness. Receptor binding analysis revealed that the mink SARS-CoV-2 spike interacted more strongly with the mink receptor than the human receptor. Tracking the variations and codon bias of SARS-CoV-2 is helpful for understanding the fitness of the virus in virus transmission, pathogenesis, and immune evasion.
Data from: Transcriptome-wide meta-analysis of codon usage in Escherichia...
zenodo.org
zip
Updated Sep 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anima Sutradhar; Anima Sutradhar; Jonathan Pointon; Christopher Lennon; Giovanni Stracquadanio; Giovanni Stracquadanio; Jonathan Pointon; Christopher Lennon (2023). Transcriptome-wide meta-analysis of codon usage in Escherichia coli [Dataset]. http://doi.org/10.5281/zenodo.8305120
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8305120
Dataset updated
Sep 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anima Sutradhar; Anima Sutradhar; Jonathan Pointon; Christopher Lennon; Giovanni Stracquadanio; Giovanni Stracquadanio; Jonathan Pointon; Christopher Lennon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data generated by CUBseq on Escherichia coli RNA-seq data.
d
Data from: Gene expression levels are correlated with synonymous codon...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Feb 5, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Williford; Jeffery P. Demuth (2013). Gene expression levels are correlated with synonymous codon usage, amino acid composition and gene architecture in the red flour beetle, Tribolium castaneum [Dataset]. http://doi.org/10.5061/dryad.r0t1q
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.r0t1q
Dataset updated
Feb 5, 2013
Dataset provided by
Dryad
Authors
Anna Williford; Jeffery P. Demuth
Time period covered
Jul 15, 2012
Description
ExpressionDataThe expression data from Tribolium castaneum whole body and reproductive tracts samples provided here is the output of ArrayStar gene expression software that was used to processes and normalize NimbleGen-generated raw expression data (Prince, Kirkland and Demuth 2010, Genome Biol Evol 2:336-346).
f
Data from: The Selective Advantage of Synonymous Codon Usage Bias in...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Mar 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hughes, Diarmaid; Brandis, Gerrit (2016). The Selective Advantage of Synonymous Codon Usage Bias in Salmonella [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001820410
Explore at:
Dataset updated
Mar 11, 2016
Authors
Hughes, Diarmaid; Brandis, Gerrit
Description
The genetic code in mRNA is redundant, with 61 sense codons translated into 20 different amino acids. Individual amino acids are encoded by up to six different codons but within codon families some are used more frequently than others. This phenomenon is referred to as synonymous codon usage bias. The genomes of free-living unicellular organisms such as bacteria have an extreme codon usage bias and the degree of bias differs between genes within the same genome. The strong positive correlation between codon usage bias and gene expression levels in many microorganisms is attributed to selection for translational efficiency. However, this putative selective advantage has never been measured in bacteria and theoretical estimates vary widely. By systematically exchanging optimal codons for synonymous codons in the tuf genes we quantified the selective advantage of biased codon usage in highly expressed genes to be in the range 0.2–4.2 x 10−4 per codon per generation. These data quantify for the first time the potential for selection on synonymous codon choice to drive genome-wide sequence evolution in bacteria, and in particular to optimize the sequences of highly expressed genes. This quantification may have predictive applications in the design of synthetic genes and for heterologous gene expression in biotechnology.
t
BIOGRID CURATED DATA FOR PUBLICATION: Codon usage affects the structure and...
thebiogrid.org
zip
Updated Aug 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BioGRID Project (2016). BIOGRID CURATED DATA FOR PUBLICATION: Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD. [Dataset]. https://thebiogrid.org/203841/publication/codon-usage-affects-the-structure-and-function-of-the-drosophila-circadian-clock-protein-period.html
Explore at:
zipAvailable download formats
Dataset updated
Aug 1, 2016
Dataset authored and provided by
BioGRID Project
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Protein-Protein, Genetic, and Chemical Interactions for Fu J (2016):Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD. curated by BioGRID (https://thebiogrid.org); ABSTRACT: Codon usage bias is a universal feature of all genomes, but its in vivo biological functions in animal systems are not clear. To investigate the in vivo role of codon usage in animals, we took advantage of the sensitivity and robustness of the Drosophila circadian system. By codon-optimizing parts of Drosophila period (dper), a core clock gene that encodes a critical component of the circadian oscillator, we showed that dper codon usage is important for circadian clock function. Codon optimization of dper resulted in conformational changes of the dPER protein, altered dPER phosphorylation profile and stability, and impaired dPER function in the circadian negative feedback loop, which manifests into changes in molecular rhythmicity and abnormal circadian behavioral output. This study provides an in vivo example that demonstrates the role of codon usage in determining protein structure and function in an animal system. These results suggest a universal mechanism in eukaryotes that uses a codon usage "code" within genetic codons to regulate cotranslational protein folding.
f
Data from: Variation and selection on codon usage bias across an entire...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jul 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hittinger, Chris Todd; Rokas, Antonis; Steenwyk, Jacob L.; LaBella, Abigail L.; Opulente, Dana A. (2019). Variation and selection on codon usage bias across an entire subphylum [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000066729
Explore at:
Dataset updated
Jul 31, 2019
Authors
Hittinger, Chris Todd; Rokas, Antonis; Steenwyk, Jacob L.; LaBella, Abigail L.; Opulente, Dana A.
Description
Variation in synonymous codon usage is abundant across multiple levels of organization: between codons of an amino acid, between genes in a genome, and between genomes of different species. It is now well understood that variation in synonymous codon usage is influenced by mutational bias coupled with both natural selection for translational efficiency and genetic drift, but how these processes shape patterns of codon usage bias across entire lineages remains unexplored. To address this question, we used a rich genomic data set of 327 species that covers nearly one third of the known biodiversity of the budding yeast subphylum Saccharomycotina. We found that, while genome-wide relative synonymous codon usage (RSCU) for all codons was highly correlated with the GC content of the third codon position (GC3), the usage of codons for the amino acids proline, arginine, and glycine was inconsistent with the neutral expectation where mutational bias coupled with genetic drift drive codon usage. Examination between genes’ effective numbers of codons and their GC3 contents in individual genomes revealed that nearly a quarter of genes (381,174/1,683,203; 23%), as well as most genomes (308/327; 94%), significantly deviate from the neutral expectation. Finally, by evaluating the imprint of translational selection on codon usage, measured as the degree to which genes’ adaptiveness to the tRNA pool were correlated with selective pressure, we show that translational selection is widespread in budding yeast genomes (264/327; 81%). These results suggest that the contribution of translational selection and drift to patterns of synonymous codon usage across budding yeasts varies across codons, genes, and genomes; whereas drift is the primary driver of global codon usage across the subphylum, the codon bias of large numbers of genes in the majority of genomes is influenced by translational selection.
d
Data from: Mitochondrial phylogenomics of early land plants: mitigating the...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Liu; Cymon J. Cox; Wei Wang; Bernard Goffinet (2025). Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias [Dataset]. http://doi.org/10.5061/dryad.7b470
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.7b470
Dataset updated
Jun 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Yang Liu; Cymon J. Cox; Wei Wang; Bernard Goffinet
Time period covered
Jan 1, 2014
Description
Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea to resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining lan...
d
Data from: Translational selection frequently overcomes genetic drift in...
search.dataone.org
datadryad.org
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aoife Doherty; James O. McInerney (2025). Translational selection frequently overcomes genetic drift in shaping synonymous codon usage patterns in vertebrates [Dataset]. http://doi.org/10.5061/dryad.4k887
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.4k887
Dataset updated
Jun 17, 2025
Dataset provided by
Dryad Digital Repository
Authors
Aoife Doherty; James O. McInerney
Time period covered
Aug 5, 2013
Description
Synonymous codon usage (SCU) patterns are shaped by a balance between mutation, drift, and natural selection. To date, detection of translational selection in vertebrates has proven to be a challenging task, obscured by small long-term effective population sizes in larger animals and the existence of isochores in some species. The consensus is that, in such species, natural selection is either completely ineffective at overcoming mutational pressures and genetic drift or perhaps is effective but so weak that it is not detectable. The aim of this research is to understand the interplay between mutation, selection, and genetic drift in vertebrates. We observe that although variation in mutational bias is undoubtedly the dominant force influencing codon usage, translational selection acts as a weak additional factor influencing synonymous codon usage. These observations indicate that translational selection is a widespread phenomenon in vertebrates and is not limited to a few species.
Codons that are significantly over-enriched in high RFP count positions in...
plos.figshare.com
xls
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Wright; Anabel Rodriguez; Jun Li; Patricia L. Clark; Tijana Milenković; Scott J. Emrich (2023). Codons that are significantly over-enriched in high RFP count positions in at least 10 of the 14 data sets considered (% Enriched > 70). [Dataset]. http://doi.org/10.1371/journal.pone.0232003.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0232003.t002
Dataset updated
Jun 15, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Gabriel Wright; Anabel Rodriguez; Jun Li; Patricia L. Clark; Tijana Milenković; Scott J. Emrich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These codons are significantly enriched at the estimated A-site in the top 10% of normalized footprint counts using a Bonferroni corrected p-value of 8.2 * 10−4 (.05/61). These codons are also analyzed with respect to each bias measure, such that a larger negative number indicates a stronger correspondence with the model. Note that there are only four bias measures listed (compared to the five codon usage models analyzed earlier) as the High-Phi %MinMax and High-Phi CAI models use the same underlying CUB measure.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2022). Codon Usage Tabulated from GenBank [Dataset]. https://bioregistry.io/cutg

Codon Usage Tabulated from GenBank

Explore at:

106 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 10, 2022

Description

Codon usage in individual genes has been calculated using the nucleotide sequence data obtained from the GenBank Genetic Sequence Database. The compilation of codon usage is synchronized with each major release of GenBank.

Clear search

Close search

Google apps

Main menu

Codon Usage Tabulated from GenBank

Codon similarity data in ATTED-II ver 8.0 (Ath, Gma, Osa, Sly, Vvi)

Codon Usage - UCI

Context

Content

Acknowledgements

Inspiration

Data from: tRic: a user-friendly data portal to explore the expression...

Codon bias and codon pair bias scores for FPR1 variants calculated based on...

Data on Stop Codon Usage in Bacteria and Its Correlation with Release Factor...

Data from: A simple model based on mutation and selection explains trends in...

RSCU values per gene in the yeast genomes

Data from: Serine codon-usage bias in deep phylogenomics: pancrustacean...

Data files for downstream analysis of the “mammalian codon usage” manuscript...

Codon and Codon-Pair Usage Tables

BIOGRID CURATED DATA FOR PUBLICATION: Host adaptation of codon usage in...

Data from: Transcriptome-wide meta-analysis of codon usage in Escherichia...

Data from: Gene expression levels are correlated with synonymous codon...

Data from: The Selective Advantage of Synonymous Codon Usage Bias in...

BIOGRID CURATED DATA FOR PUBLICATION: Codon usage affects the structure and...

Data from: Variation and selection on codon usage bias across an entire...

Data from: Mitochondrial phylogenomics of early land plants: mitigating the...

Data from: Translational selection frequently overcomes genetic drift in...

Codons that are significantly over-enriched in high RFP count positions in...

Codon Usage Tabulated from GenBank