83 datasets found
  1. d

    3D-Genomics Database

    • dknet.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome

  2. Repeat elements organise 3D genome structure and mediate transcription in...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David J. Winter; Austen R. D. Ganley; Carolyn A. Young; Ivan Liachko; Christopher L. Schardl; Pierre-Yves Dupont; Daniel Berry; Arvina Ram; Barry Scott; Murray P. Cox (2023). Repeat elements organise 3D genome structure and mediate transcription in the filamentous fungus Epichloë festucae [Dataset]. http://doi.org/10.1371/journal.pgen.1007467
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David J. Winter; Austen R. D. Ganley; Carolyn A. Young; Ivan Liachko; Christopher L. Schardl; Pierre-Yves Dupont; Daniel Berry; Arvina Ram; Barry Scott; Murray P. Cox
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structural features of genomes, including the three-dimensional arrangement of DNA in the nucleus, are increasingly seen as key contributors to the regulation of gene expression. However, studies on how genome structure and nuclear organisation influence transcription have so far been limited to a handful of model species. This narrow focus limits our ability to draw general conclusions about the ways in which three-dimensional structures are encoded, and to integrate information from three-dimensional data to address a broader gamut of biological questions. Here, we generate a complete and gapless genome sequence for the filamentous fungus, Epichloë festucae. We use Hi-C data to examine the three-dimensional organisation of the genome, and RNA-seq data to investigate how Epichloë genome structure contributes to the suite of transcriptional changes needed to maintain symbiotic relationships with the grass host. Our results reveal a genome in which very repeat-rich blocks of DNA with discrete boundaries are interspersed by gene-rich sequences that are almost repeat-free. In contrast to other species reported to date, the three-dimensional structure of the genome is anchored by these repeat blocks, which act to isolate transcription in neighbouring gene-rich regions. Genes that are differentially expressed in planta are enriched near the boundaries of these repeat-rich blocks, suggesting that their three-dimensional orientation partly encodes and regulates the symbiotic relationship formed by this organism.

  3. o

    Data from: In situ genome sequencing resolves DNA sequence and structure in...

    • idr-testing.openmicroscopy.org
    • idr.openmicroscopy.org
    Updated Dec 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). In situ genome sequencing resolves DNA sequence and structure in intact biological samples [Dataset]. https://idr-testing.openmicroscopy.org/study/idr0101/
    Explore at:
    Dataset updated
    Dec 31, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Understanding genome organization requires integration of DNA sequence and 3D spatial context, however, existing genome-wide methods lack either base-pair sequence resolution or direct spatial localization. Here, we describe in situ genome sequencing (IGS), a method for simultaneously sequencing and imaging genomes within intact biological samples. We applied IGS to human fibroblasts and early mouse embryos, spatially localizing thousands of genomic loci in individual nuclei. Using these data, we characterized parent-specific changes in genome structure across embryonic stages, revealed single-cell chromatin domains in zygotes, and uncovered epigenetic memory of global chromosome positioning within individual embryos. These results demonstrate how in situ genome sequencing can directly connect sequence and structure across length scales from single base pairs to whole organisms.

  4. n

    CYGD - Comprehensive Yeast Genome Database

    • neuinfo.org
    • dknet.org
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CYGD - Comprehensive Yeast Genome Database [Dataset]. http://identifiers.org/RRID:SCR_002289
    Explore at:
    Description

    The MIPS Comprehensive Yeast Genome Database (CYGD) aims to present information on the molecular structure and functional network of the entirely sequenced, well-studied model eukaryote, the budding yeast Saccharomyces cerevisiae. In addition, the data of various projects on related yeasts are used for comparative analysis.

  5. Orca: Sequence-based modeling of genome 3D architecture from kilobase to...

    • zenodo.org
    application/gzip
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Zhou; Jian Zhou (2021). Orca: Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale (Part2) [Dataset]. http://doi.org/10.5281/zenodo.4594676
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 20, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jian Zhou; Jian Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset (Part 2) provides additional chromatin tracks files required for using the chromatin track plotting functions of Orca. Orca is a sequence-based deep learning modeling framework for multiscale genome 3D architecture.

  6. b

    Full genome and transcriptome sequence assembly of the non-model organism...

    • bco-dmo.org
    • search.dataone.org
    • +1more
    csv
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crow White; Robert J. Toonen; Mark Christie; Jean Davidson; Paul Anderson; Benjamin Daniels; Andy Lee; Cataixa López (2024). Full genome and transcriptome sequence assembly of the non-model organism Kellet’s whelk, Kelletia kelletii [Dataset]. http://doi.org/10.26008/1912/bco-dmo.945292.1
    Explore at:
    csv(46.23 KB)Available download formats
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Crow White; Robert J. Toonen; Mark Christie; Jean Davidson; Paul Anderson; Benjamin Daniels; Andy Lee; Cataixa López
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Aug 28, 2019 - Jul 8, 2020
    Area covered
    Variables measured
    Run, Bases, Bytes, Consent, Ecotype, version, Organism, Platform, latitude, BioSample, and 28 more
    Measurement technique
    Automated DNA Sequencer
    Description

    Description of linked resources for this dataset, all links can be found in the related dataset section.

  7. H

    Data from: 3D genomics across the tree of life identifies condensin II as a...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claire Hoencamp (2021). Data from: 3D genomics across the tree of life identifies condensin II as a determinant of architecture type [Dataset]. http://doi.org/10.7910/DVN/UROKAG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 6, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Claire Hoencamp
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We analyzed conservation of condensin II complex in 24 species across the tree of life subunits with a multistep BLAST approach. The data found here is the BLAST alignments for these searches. The first searches were conducted in October/November 2019 and were manually double-checked in February and March 2020. Searches for other organisms were conducted in June 2020. All alignments were posted in: Our approach was based on a search strategy as used in earlier work by King et al. (https://doi.org/10.1093/molbev/msz140). We started by collecting publicly available protein sequences of the condensin I and II complex subunits of four diverse species from Uniprot: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana. As a positive control we searched for SMC2 and SMC4, and the condensin I subunits, which are thought to be essential in all species. In the first alignment step, we used tblastn to search with the translated protein sequences of the above species against the nucleotide collection (nr/nt) database of the target species. The Expect threshold was set at 0.05. We reported an alignment as a hit when it had an E-value of 1E-10 or less with multiple regions of alignment. If there was an alignment with less confidence, we did an extra validation step to confirm the alignment. This step entailed downloading the translated nucleotide sequence of the putative alignment and using tblastn to search against the genome of a closely related organism with an annotated genome. If this search yielded the putative protein we used as a bait, we considered the hit validated. In the second alignment step we used the same approach, but we blasted against the wgs database of the target species. We again used 1E-10 as E-value cut-off. In the third step, only a few organisms still had missing subunits. To make an extra effort to find these subunits, we used the corresponding subunits of the nearest neighbour, which we identified in step 1 or 2, as bait. As the identified subunits were all nucleotide sequences, we used tblastx to translate these query sequences to protein sequences and blast against a translated nucleotide database. In this step we searched both the nr/nt database and the wgs database. As we were able to identify all SMC2/4 subunits, but still missed condensin II subunits we are now fairly sure these organisms indeed miss these condensin II subunits. However, it is still possible these organisms do have all condensin II subunits, but with very low sequence conservation. We were also able to identify the condensin I subunits in almost all species, with two notable exceptions (see Table S4). The Arctic lamprey lacked condensin I subunits CAPG and CAPD2. Because we were able to identify all condensin II subunits in this organism, we still included this species in our analysis. The other exception is the tardigrade. In this species we identified SMC2 and SMC4, but could not identify any of the accessory subunits of condensin I nor II. There are multiple possible explanations for this. On the one hand, it might have a biological explanation, for example in this organism condensin’s accessory subunits have evolved beyond recognition with our methods, or this species indeed has lost both condensin I and II. On the other hand, the missing subunits may be explained by a technical issue, e.g. the quality of the databases. Therefore we cannot with full certainty conclude that condensin II is indeed missing in the tardigrade, and this will need to be investigated further.

  8. n

    H-InvDB

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). H-InvDB [Dataset]. http://identifiers.org/RRID:SCR_013265
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    H-Invitational Database (H-InvDB) is an integrated database of human genes and transcripts. By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing isoforms, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms (SNPs, indels and microsatellite repeats) , relation with diseases, gene expression profiling, and molecular evolutionary features , protein-protein interactions (PPIs) and gene families/groups. This database is produced by the Genome Information Integration Project (2005-) based upon the annotation technology established in the H-Invitational Project for annotation of human full-length cDNAs.

  9. s

    Mouse Genome Informatics: The Mouse Gene Expression Information Resource...

    • scicrunch.org
    Updated Oct 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Mouse Genome Informatics: The Mouse Gene Expression Information Resource Project [Dataset]. http://identifiers.org/RRID:SCR_006630
    Explore at:
    Dataset updated
    Oct 17, 2019
    Description

    A unified resource that combines text-based and 3D graphical methods to store, display, and analyze mouse developmental gene expression information. The Mouse Gene Expression Information Resource resource will integrate the following components: * Gene Expression Database (GXD) - Integrates different types of expression data and provides links to many other resources to place the data into the larger biological and analytical context. * Anatomy Database - Provides the standard nomenclature for developmental anatomy. * 3D Atlas / Graphical Gene Expression Database - Provides a high-resolution digital representation of mouse anatomy reconstructed from serial sections of single embryos at each representative developmental stage enabling 3D graphical display and analysis of in situ expression data.

  10. Additional file 4 of Common DNA sequence variation influences 3-dimensional...

    • springernature.figshare.com
    xlsx
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David U. Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony D. Schmitt; Amina Noor; Joshua Chiou; Kyle J. Gaulton; Jonathan Sebat; Yun Li; Kasper D. Hansen; Bing Ren (2024). Additional file 4 of Common DNA sequence variation influences 3-dimensional conformation of the human genome [Dataset]. http://doi.org/10.6084/m9.figshare.11981355.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    David U. Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony D. Schmitt; Amina Noor; Joshua Chiou; Kyle J. Gaulton; Jonathan Sebat; Yun Li; Kasper D. Hansen; Bing Ren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: Table S5. Regions showing evidence of biological variability in 3D chromatin conformation.

  11. MOESM5 of Highly efficient lipid production in the green alga Parachlorella...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuhei Ota; Kenshiro Oshima; Tomokazu Yamazaki; Sangwan Kim; Zhe Yu; Mai Yoshihara; Kohei Takeda; Tsuyoshi Takeshita; Aiko Hirata; Kateřina Bišová; Vilém Zachleder; Masahira Hattori; Shigeyuki Kawano (2023). MOESM5 of Highly efficient lipid production in the green alga Parachlorella kessleri: draft genome and transcriptome endorsed by whole-cell 3D ultrastructure [Dataset]. http://doi.org/10.6084/m9.figshare.10038743.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Shuhei Ota; Kenshiro Oshima; Tomokazu Yamazaki; Sangwan Kim; Zhe Yu; Mai Yoshihara; Kohei Takeda; Tsuyoshi Takeshita; Aiko Hirata; Kateřina Bišová; Vilém Zachleder; Masahira Hattori; Shigeyuki Kawano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 9. P values of the present RNA-seq analysis.

  12. Additional file 9 of Common DNA sequence variation influences 3-dimensional...

    • springernature.figshare.com
    xlsx
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David U. Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony D. Schmitt; Amina Noor; Joshua Chiou; Kyle J. Gaulton; Jonathan Sebat; Yun Li; Kasper D. Hansen; Bing Ren (2024). Additional file 9 of Common DNA sequence variation influences 3-dimensional conformation of the human genome [Dataset]. http://doi.org/10.6084/m9.figshare.11981370.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    David U. Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony D. Schmitt; Amina Noor; Joshua Chiou; Kyle J. Gaulton; Jonathan Sebat; Yun Li; Kasper D. Hansen; Bing Ren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 9: Table S10. Overlaps between 3D chromatin QTL and GWAS catalog.

  13. T

    Data from: Genome-wide chromosome architecture prediction reveals...

    • dtechtive.com
    Updated Oct 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edinburgh DataShare (2024). Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure [Dataset]. https://dtechtive.com/datasets/48863
    Explore at:
    Dataset updated
    Oct 22, 2024
    Dataset provided by
    Edinburgh DataShare
    Area covered
    Scotland
    Description

    Data associated with the paper:Michael Chiang, Chris A Brackley, Catherine Naughton, Ryu-Suke Nozawa, Cleis Battaglia, Davide Marenduzzo, Nick Gilbert (in press). Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure. Cell Genomics.Classical observations suggest a connection between 3D gene structure and function, but testing this hypothesis has been challenging due to technical limitations. To explore this, we developed e-HiP-HoP, a model based on genome organisation principles to predict the 3D structure of human chromatin. We defined a new 3D structural unit, a “topos”, which represents the regulatory landscape around gene promoters. Using GM12878 cells, we predicted the 3D structure of over 10,000 active gene topoi and stored them in the 3DGene database. Data mining revealed folding motifs and their link to gene ontology features. We computed a structural diversity score and identified influential nodes—chromatin sites that frequently interact with gene promoters, acting as key regulators. These nodes drive structural diversity and are tied to gene function. e-HiP-HoP provides a framework for modelling high-resolution chromatin structure and a mechanistic basis for chromatin contact networks that link 3D gene structure with function. This dataset contains the simulation code and data underlying the figures of the associated paper.

  14. MOESM4 of Common DNA sequence variation influences 3-dimensional...

    • springernature.figshare.com
    xlsx
    Updated Feb 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony Schmitt; Amina Noor; Joshua Chiou; Kyle Gaulton; Jonathan Sebat; Yun Li; Kasper Hansen; Bing Ren (2024). MOESM4 of Common DNA sequence variation influences 3-dimensional conformation of the human genome [Dataset]. http://doi.org/10.6084/m9.figshare.11296475.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    David Gorkin; Yunjiang Qiu; Ming Hu; Kipper Fletez-Brant; Tristin Liu; Anthony Schmitt; Amina Noor; Joshua Chiou; Kyle Gaulton; Jonathan Sebat; Yun Li; Kasper Hansen; Bing Ren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 4: Table S5. Regions showing evidence of biological variability in 3D chromatin conformation.

  15. n

    EMAGE Gene Expression Database

    • neuinfo.org
    • rrid.site
    • +2more
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). EMAGE Gene Expression Database [Dataset]. http://identifiers.org/RRID:SCR_005391
    Explore at:
    Dataset updated
    Oct 5, 2024
    Description

    A database of in situ gene expression data in the developing mouse embryo and an accompanying suite of tools to search and analyze the data. mRNA in situ hybridization, protein immunohistochemistry and transgenic reporter data is included. The data held is spatially annotated to a framework of 3D mouse embryo models produced by EMAP (e-Mouse Atlas Project). These spatial annotations allow users to query EMAGE by spatial pattern as well as by gene name, anatomy term or Gene Ontology (GO) term. The conceptual framework which houses the descriptions of the gene expression patterns in EMAGE is the EMAP Mouse Embryo Anatomy Atlas. This consists of a set of 3D virtual embryos at different stages of development, as well as an accompanying ontology of anatomical terms found at each stage. The raw data images can be conventional 2D photographs (of sections or wholemount specimens) or 3D images of wholemount specimens derived from Optical Projection Tomography (OPT) or confocal microscopy. Users may submit data using a Data submission tool or without.

  16. S

    Chromosome-scale haploid genome sequence and annotation dataset of the...

    • scidb.cn
    Updated Jun 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    冀晓昊 (2024). Chromosome-scale haploid genome sequence and annotation dataset of the durian cultivar 'Kan Yao' [Dataset]. http://doi.org/10.57760/sciencedb.agriculture.00013
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 16, 2024
    Dataset provided by
    Science Data Bank
    Authors
    冀晓昊
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset includes the sequence files and gene annotation files for two haplotype genomes of the durian cultivar 'Kan Yao', assembled using HiFi, ONT, Hi-C, and second-generation sequencing data. The core software used for genome assembly includes Hifiasm (0.19.5), 3D-DNA (190716), and AssemblyMapper (1.0.3). Gene structure annotation employed three distinct strategies: de novo annotation, homology-based annotation, and transcriptome-based annotation. For de novo annotation, Braker software was used to construct models based on Arabidopsis protein sequences (arabidopsis_pep_20101214.fa) and all merged Illumina transcriptome data to predict gene structures. Homology-based annotation was conducted using GenomeThreader software, referencing the protein annotation file of the Durian genome (GCF_002303985.1_Duzib1.0_protein.faa). Transcriptome-based annotation was performed using PASA software, utilizing all merged Iso-Seq transcriptome sequencing data. Subsequently, the annotation files from these strategies were merged using EVM software and updated with PASA software to incorporate UTR and alternative splicing information, resulting in the final annotation file. Non-coding gene annotation was performed using Infernal software.

  17. Disease association of 3D-domain swapped predicted protein sequences of...

    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul Kumar Upadhyay; Ramanathan Sowdhamini (2023). Disease association of 3D-domain swapped predicted protein sequences of human genome. [Dataset]. http://doi.org/10.1371/journal.pone.0159627.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Atul Kumar Upadhyay; Ramanathan Sowdhamini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Normalization (Z score) was calculated based on domain swapped entries, in whole human genome (8945/20247).

  18. r

    HUDSEN Human Gene Expression Spatial Database

    • rrid.site
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). HUDSEN Human Gene Expression Spatial Database [Dataset]. http://identifiers.org/RRID:SCR_006325
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of a set of standard 3D virtual models at different stages of development from Carnegie Stages (CS) 12-23 (approximately 26-56 days post conception) in which various anatomical regions have been defined with a set of anatomical terms at various stages of development (known as an ontology). Experimental data is captured and converted to digital format and then mapped to the appropriate 3D model. The ontology is used to define sites of gene expression using a set of standard descriptions and to link the expression data to an ''''anatomical tree''''. Human data from stages CS12 to CS23 can be submitted to the HUDSEN Gene Expression Database. The anatomy ontology currently being used is based on the Edinburgh Human Developmental Anatomy Database which encompasses all developing structures from CS1 to CS20 but is not detailed for developing brain structures. The ontology is being extended and refined (by Prof Luis Puelles, University of Murcia, Spain) and will be incorporated into the HUDSEN database as it is developed. Expression data is annotated using two methods to denote sites of expression in the embryo: spatial annotation and text annotation. Additionally, many aspects of the detection reagent and specimen are also annotated during this process (assignment of IDs, nucleotide sequences for probes etc). There are currently two main ways to search HUDSEN - using a gene/protein name or a named anatomical structure as the query term. The entire contents of the database can be browsed using the data browser. Results may be saved. The data in HUDSEN is generated from both from researchers within the HUDSEN project, and from the wider scientific community. The HUDSEN human gene expression spatial database is a collaboration between the Institute of Human Genetics in Newcastle, UK, and the MRC Human Genetics Unit in Edinburgh, UK, and was developed as part of the Electronic Atlas of the Developing Human Brain (EADHB) project (funded by the NIH Human Brain Project). The database is based on the Edinburgh Mouse Atlas gene expression database (EMAGE), and is designed to be an openly available resource to the research community holding gene expression patterns during early human development.

  19. [s235810] [Cr2]

    • thermofisher.cn
    Updated Jul 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thermo Fisher Scientific (2021). [s235810] [Cr2] [Dataset]. https://www.thermofisher.cn/order/genome-database/details/sirna/s235810
    Explore at:
    Dataset updated
    Jul 22, 2021
    Dataset authored and provided by
    Thermo Fisher Scientifichttp://thermofisher.com/
    Description

    [This gene encodes a membrane protein, which functions as a receptor for Epstein-Barr virus (EBV) binding on B and T lymphocytes. Genetic variations in ]

  20. n

    Data from: Population genomic signatures of the oriental fruit moth related...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jan 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shu-Jun Wei; Li-Jun Cao; Wei Song; Jin-Cui Chen (2022). Population genomic signatures of the oriental fruit moth related to the Pleistocene climates [Dataset]. http://doi.org/10.5061/dryad.6wwpzgmzm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 11, 2022
    Dataset provided by
    Beijing Academy of Agricultural and Forestry Sciences
    Authors
    Shu-Jun Wei; Li-Jun Cao; Wei Song; Jin-Cui Chen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The Quaternary climatic oscillations are expected to have had strong impacts on the evolution of species. Although legacies of the Quaternary climates on population processes have been widely identified in diverse groups of species, adaptive genetic changes shaped during the Quaternary have been harder to decipher. Here, we assembled a chromosome-level genome of the oriental fruit moth and compared genomic variation among refugial and colonized populations of this species that diverged in the Pleistocene. High genomic diversity was maintained in refugial populations. Demographic analysis showed that the effective population size of refugial populations declined during the penultimate glacial maximum (PGM) but remained stable during the last glacial maximum (LGM), indicating a strong impact of the PGM rather than the LGM on this pest species. Genome scans identified one chromosomal inversion and a mutation of the circadian gene Clk on the neo-Z chromosome potentially related to the endemicity of a refugial population. In the colonized populations, genes in pathways of energy metabolism and wing development showed signatures of selection. These different genomic signatures of refugial and colonized populations point to multiple impacts of Quaternary climates on adaptation in an extant species.

    Methods Samples

    A laboratory-reared strain of the OFM was used for de novo genome sequencing. This strain derived from three male and female pairs and had been maintained for ten generations on apples in laboratory conditions. We collected 263 individuals from 15 geographical populations across the native range of the OFM in China, of which three populations were collected from Sichuan basin, three populations were from the Yunnan region, and six populations were from regions where OFM subsequently dispersed (Table S1, Fig. 2). We chose three representative populations (31 individuals in total) for whole-genome resequencing; one population (YNHH) was collected from the original and refugial area in Yunnan, one (SCCD) was from another refugial area in Sichuan, while the last population (BJPG) was from colonized areas of northern China (Table S1 and Fig. 2). The other 12 populations (232 individuals) were genotyped by the Kompetitive Allele-Specific PCR (KASP) method for 22 representative SNP outliers (see below).

    De novo genome assembly

    We constructed and sequenced an Illumina library, a NanoPore library, a Hi-C proximity ligation library, and four RNA-seq libraries (eggs, larvae, pupae, and adults) for assembly and annotation of the OFM genome. The raw reads generated from the Illumina platform were filtered by Trimmomatic v0.38 (Bolger et al. 2014) and then used to estimate genome size, heterozygosity, and duplication rate using GenomeScope v1.0 (Vurture et al. 2017).

    Long reads generated from the NanoPore platform were corrected and assembled using CANU version v1.8 (Koren et al. 2017) with default parameters. The assembled contigs were polished based on Illumina short reads using Pilon v1.22 (Walker et al. 2014). To remove the possible secondary alleles, the assembled contigs were filtered using the pipeline Purge Haplotigs (Roach et al. 2018), resulting in a contig-level genome. The Illumina short reads sequenced from the Hi-C library were used to assemble these contigs into a chromosome-level genome using the Juicer v1.5 (Neva C. et al. 2016) and 3D de novo assembly (3D-DNA) pipelines (Dudchenko et al. 2017).

    The completeness of each assembled version of the genome was assessed using a Benchmarking Universal Single-Copy Orthologs (BUSCO) v3.0.2 (Simao et al. 2015) analysis, based on the insecta_odb9 database (1,658 genes). We conducted a synteny analysis between OFM and the codling moth Cydia pomonella (Lepidoptera: Tortricidae) (Assembly accession: GCA_003425675.2) (Wan et al. 2019) and Spodoptera litura (Lepidoptera: Noctuidae) (Assembly accession: GCF_002706865.1) (Cheng et al. 2017) using MCSCAN (Wang et al. 2012).

    Genome annotation

    Repeats and transposable element families in the OFM genome were detected by RepeatMasker pipeline v4.0.7 (Tarailo-Graovac & Chen 2009) against the Insecta repeats within RepBase Update (http://www.girinst.org) and Dfam database (20170127), with RMBlast v2.10.0 as a search engine. tRNAs were annotated by tRNAscan-SE (Lowe & Eddy 1997) with default parameters; rRNAs were annotated by RNAmmer prediction (Lagesen et al. 2007).

    The protein-coding gene in the OFM genome was annotated using ab initio, RNA-seq-based, and homolog-based methods in the MAKER version 3.01.03 genome annotation pipeline (Cantarel et al. 2008). For the RNA-seq-based method, the RNA-seq reads were first mapped to the genome of OFM with Hisat v2.2.0, and then the transcripts were assembled using StringTie v2.1.2. For ab initio methods, parameters of SNAP v2013-02-16 (Korf 2004) and Augustus v3.2.3 (Stanke & Waack 2003) were estimated or trained before using them to predict genes in MAKER. The SNAP parameters were estimated from high-quality transcripts obtained by improvement and filtering using PASA v2.4.1 (Brian J et al. 2003). The gene model of Augustus was directly obtained from the above BUSCO analysis of the genome assembly. For the homolog-based method, protein-coding genes of Drosophila melanogaster and Bombyx mori were used. Then, fragments per kilobase per million (FPKM) values of each gene predicted by the MAKER pipeline were calculated using cufflinks version 2.2.1 (Kim et al. 2013); the gene set was filtered by keeping those with an FPKM value > 0 in any RNA-seq library. Finally, PASA was used to update the annotation based on transcripts; all predictions were further filtered using GffRead v0.11.7 implemented in Cufflinks v2.2.1 (Kim et al. 2013) to remove genes having in-frame stop codons. Functions of the protein-coding genes were annotated using eggNOG-Mapper v1.0.3 (Huerta-Cepas et al. 2017) against the database EggNOG v5.0 (Huerta-Cepas et al. 2019). Genes that can be functionally annotated by EGGNOG analysis were retained in gene structure annotation and used for further analysis.

    Gene family annotation

    Protein-coding genes from available genomes of two Coleoptera, two Diptera and another 11 Lepidoptera were retrieved from the NCBI genome database for comparative analysis. Orthologs were identified using OrthoFinder v2.2.7 (Emms & Kelly 2015) under default parameters. MAFFT v7.450 (Katoh & Standley 2013) was used to align amino acid sequences of 1:1:1 orthologous gene with the G-INS-I algorithm. The phylogenetic tree was inferred using an approximately-maximum-likelihood method implemented in FastTree v2.1.10 (Price et al. 2009). We used r8s (Sanderson 2003) to estimate the divergent times among species with divergence times of two nodes, i.e. Tribolium castaneum and Anoplophora glabripennis (Wang et al. 2019), Trichoplusia ni and S. litura (Wan et al. 2019) as calibrations. The Computational Analysis module of gene Family Evolution (CAFE) version 3.1 (Bie et al. 2006) was used to analyze gene family expansion and contraction.

    To explore possible genomic components related to environmental adaption in the OFM as well as the other tortricid moth, C. pomonella, we manually annotated detoxification genes, chemosensory genes, and heat shock proteins (HSP) genes and compared these gene families among 13 representative genomes of Lepidoptera. The detoxification genes include five families of cytochrome P450 monooxygenases (P450s), glutathione-s transferases (GSTs), ATP-binding cassette transporters (ABCs), UDP-glycosyltransferases (UGTs), and carboxyl/cholinesterases (CCEs). The chemosensory genes include four families of olfactory receptors (ORs), gustatory receptors (GRs), Ionotropic receptors (IRs), and odorant-binding proteins (OBPs).

    We used both model-based and similarity-based methods to annotate these gene families. For model-based identification, the Hidden Markov models (HMMs) were downloaded from Pfam 32.0 database (September 2018; (El-Gebali et al. 2018)) and run with HMMER v3.3 (Finn et al. 2011). The corresponding HMM model not found in the Pfam database was manually trained using HMMER under the default parameters. For similarity-based identification, we used orthologs from D. melanogaster, B. mori, Aedes aegypti, Anopheles gambiae, and C. pomonella to search against target genomes using BLAST v2.2.31 (Altschul et al. 1990) with an e-value cutoff of 1e-5. An automatic BITACORA v1.0 (Vizueta et al. 2020) pipeline (full mode) was used to conduct the HMMER and BLAST analyses. The annotated genes were filtered manually based on gene length and the presence of conserved domains by removing genes shorter than 80 amino acids and those lacking conserved domains.

    Genotyping populations across the native range

    For genome resequencing of three representative populations, individual DNA libraries with an insert size of 400 bp were constructed and sequenced on the Illumina HiSeq X Ten platform to obtain 2×150 bp paired-end reads. A sequence depth of approximately 36-fold was obtained for each sample. After filtering out raw sequencing reads containing adapters and reads of low quality, the remaining clean reads were mapped to the reference assembly using BWA v0.7.17 with default parameters (Li & Durbin 2009). SAMtools v1.9 (Li et al. 2009) was used to sort reads and remove mapping quality lower than 30. Single-nucleotide polymorphism (SNP) calling was performed using the Genome Analysis Toolkit (GATK) v3.5 (McKenna et al. 2010). The criteria used to filter the raw SNPs were “QD < 2.0, FS > 60, SOR > 4.0, MQ < 40”. SNPs were further filtered using the R package vcfR (Knaus & Grünwald 2017) and VCFtools v0.1.16 (Danecek et al. 2011) with the following criteria: SNPs with a sequencing depth lower than four and higher than 500 were removed; SNPs with a missing rate higher than 10% were removed. All the SNPs were annotated with SnpEff v4.3 (Cingolani et

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). 3D-Genomics Database [Dataset]. http://identifiers.org/RRID:SCR_007430

3D-Genomics Database

RRID:SCR_007430, nif-0000-00553, 3D-Genomics Database (RRID:SCR_007430), 3D-GENOMICS

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 29, 2022
Description

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 29, 2016. Database containing structural annotations for the proteomes of just under 100 organisms. Using data derived from public databases of translated genomic sequences, representatives from the major branches of Life are included: Prokaryota, Eukaryota and Archaea. The annotations stored in the database may be accessed in a number of ways. The help page provides information on how to access the database. 3D-GENOMICS is now part of a larger project, called e-Protein. The project brings together similar databases at three sites: Imperial College London , University College London and the European Bioinformatics Institute . e-Protein''s mission statement is To provide a fully automated distributed pipeline for large-scale structural and functional annotation of all major proteomes via the use of cutting-edge computer GRID technologies. The following databases are incorporated: NRprot, SCOP, ASTRAL, PFAM, Prosite, taxonomy, COG The following eukaryotic genomes are incorporated: Anopheles gambiae, protein sequences from the mosquito genome; Arabidopsis thaliana, protein sequences from the Arabidopsis genome; Caenorhabditis briggsae, protein sequences from the C.briggsae genome; Caenorhabditis elegans protein sequences from the worm genome; Ciona intestinalis protein sequences from the sea squirt genome; Danio rerio protein sequences from the zebrafish genome; Drosophila melanogaster protein sequences from the fruitfly genome; Encephalitozoon cuniculi protein sequences from the E.cuniculi genome; Fugu rubripes protein sequences from the pufferfish genome; Guillardia theta protein sequences from the G.theta genome; Homo sapiens protein sequences from the human genome; Mus musculus protein sequences from the mouse genome; Neurospora crassa protein sequences from the N.crassa genome; Oryza sativa protein sequences from the rice genome; Plasmodium falciparum protein sequences from the P.falciparum genome; Rattus norvegicus protein sequences from the rat genome; Saccharomyces cerevisiae protein sequences from the yeast genome; Schizosaccharomyces pombe protein sequences from the yeast genome

Search
Clear search
Close search
Google apps
Main menu