20 datasets found
  1. d

    Data from: Genomes To Fields (G2F) Inbred Ear Imaging Data 2017

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Genomes To Fields (G2F) Inbred Ear Imaging Data 2017 [Dataset]. https://catalog.data.gov/dataset/genomes-to-fields-g2f-inbred-ear-imaging-data-2017-079c0
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    A subset of ~30 inbreds were evaluated in 2014 and 2015 to develop an image based ear phenotyping tool. The data is stored in CyVerse. Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development. Resources in this dataset:Resource Title: CyVerse Genomes To Fields Inbred Ear Imaging 2017 dataset download. File Name: Web Page, url: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/Edgar_Spalding_G2F_Inbred_Ear_Imaging_June_2017 Dataset (csv, tar.gz) and metadata (BibTex/Endnote) downloads. See _readme.txt for file contents.

  2. KMCP Manuscript Data

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Dec 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wei Shen; Wei Shen (2022). KMCP Manuscript Data [Dataset]. http://doi.org/10.5281/zenodo.6334744
    Explore at:
    Dataset updated
    Dec 17, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wei Shen; Wei Shen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping

    ## 1.code-and-documents

    This directory contains the source code, executable binaries, and documents of KMCP,
    which are also hosted at Github: https://github.com/shenwei356/kmcp .

    Databases, usage, and tutorials of KMCP are also available at https://bioinf.shenwei.me/kmcp/.

    - [Installation](https://bioinf.shenwei.me/kmcp/download)
    - [Databases](https://bioinf.shenwei.me/kmcp/database)
    - Tutorials
    - [Taxonomic profiling](https://bioinf.shenwei.me/kmcp/tutorial/profiling)
    - [Sequence and genome searching](https://bioinf.shenwei.me/kmcp/tutorial/searching)
    - [Usage](https://bioinf.shenwei.me/kmcp/usage)
    - [Benchmarks](https://bioinf.shenwei.me/kmcp/benchmark)
    - [FAQs](https://bioinf.shenwei.me/kmcp/faq)

    ## 2.databases

    This directory contains the building steps and reference genome accessions for
    KMCP databases used in the manuscript.

    cami2 Databases used in benchmarks on CAMI2 mouse gut datasets
    kmcp Databases used in other benchmarks

    ## 3.figures

    Each subdirectory contains steps to run the benchmark (`README.md`), steps for plotting (`README-plot.md`),
    benchmark results, and figures.

  3. f

    Data from: Building a Statistical Model for Predicting Cancer Genes

    • datasetcatalog.nlm.nih.gov
    Updated Nov 15, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amos, Christopher; Logothetis, Christopher J.; Gorlov, Ivan P.; Fang, Shenying; Gorlova, Olga Y. (2012). Building a Statistical Model for Predicting Cancer Genes [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001123791
    Explore at:
    Dataset updated
    Nov 15, 2012
    Authors
    Amos, Christopher; Logothetis, Christopher J.; Gorlov, Ivan P.; Fang, Shenying; Gorlova, Olga Y.
    Description

    More than 400 cancer genes have been identified in the human genome. The list is not yet complete. Statistical models predicting cancer genes may help with identification of novel cancer gene candidates. We used known prostate cancer (PCa) genes (identified through KnowledgeNet) as a training set to build a binary logistic regression model identifying PCa genes. Internal and external validation of the model was conducted using a validation set (also from KnowledgeNet), permutations, and external data on genes with recurrent prostate tumor mutations. We evaluated a set of 33 gene characteristics as predictors. Sixteen of the original 33 predictors were significant in the model. We found that a typical PCa gene is a prostate-specific transcription factor, kinase, or phosphatase with high interindividual variance of the expression level in adjacent normal prostate tissue and differential expression between normal prostate tissue and primary tumor. PCa genes are likely to have an antiapoptotic effect and to play a role in cell proliferation, angiogenesis, and cell adhesion. Their proteins are likely to be ubiquitinated or sumoylated but not acetylated. A number of novel PCa candidates have been proposed. Functional annotations of novel candidates identified antiapoptosis, regulation of cell proliferation, positive regulation of kinase activity, positive regulation of transferase activity, angiogenesis, positive regulation of cell division, and cell adhesion as top functions. We provide the list of the top 200 predicted PCa genes, which can be used as candidates for experimental validation. The model may be modified to predict genes for other cancer sites.

  4. u

    Genomes To Fields 2016

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    bin
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darwin Campbel; Natalia deLeon; Jode Edwards; Jack Gardiner; Naser Al Khalifah; Carolyn J. Lawrence-Dill; Jane Petzoldt; Cinta Romay; Renee Walton; Genomes to Fields Cooperators (2023). Genomes To Fields 2016 [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Genomes_To_Fields_2016/24852822
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    CyVerse Data Commons
    Authors
    Darwin Campbel; Natalia deLeon; Jode Edwards; Jack Gardiner; Naser Al Khalifah; Carolyn J. Lawrence-Dill; Jane Petzoldt; Cinta Romay; Renee Walton; Genomes to Fields Cooperators
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Phenotypic, genotypic, and environment data for the 2016 field season: The data is stored in CyVerse. Data types in this directory tree are: hybrid and inbred agronomic and performance traits; inbred genotypic data; and environmental (soil, weather) data collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development. Resources in this dataset:Resource Title: CyVerse Genomes To Fields 2016 dataset download. File Name: Web Page, url: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/GenomesToFields_G2F_2016_Data_Mar_2018 Dataset (csv) and metadata (BibTex, Endnote) data downloads. See _readme.txt for file contents.

  5. Data from: EukProt: a database of genome-scale predicted proteins across the...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Richter; Cédric Berney; Jürgen Strassert; Yu-Ping Poh; Emily K. Herman; Sergio A. Muñoz-Gómez; Jeremy G. Wideman; Fabien Burki; Colomban de Vargas (2023). EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotes [Dataset]. http://doi.org/10.6084/m9.figshare.12417881.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Daniel Richter; Cédric Berney; Jürgen Strassert; Yu-Ping Poh; Emily K. Herman; Sergio A. Muñoz-Gómez; Jeremy G. Wideman; Fabien Burki; Colomban de Vargas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Version 3 (22 November, 2021)

    See https://doi.org/10.24072/pcjournal.173 for a detailed description of the database. See http://evocellbio.com/eukprot/ for a BLAST database, interactive plots of BUSCO scores and ‘The Comparative Set’ (TCS): A selected subset of EukProt for comparative genomics investigations. Protein sequence FASTA files of the TCS are available at https://doi.org/10.6084/m9.figshare.21586065. See https://github.com/beaplab/EukProt for utility scripts, annotations, and all the files necessary to build the tree in Figures 1 and 3 (from the DOI above).

    Scroll to the end of this page for changes since version 2.

    Are we missing anything? Please let us know!

    EukProt is a database of published and publicly available predicted protein sets selected to represent the breadth of eukaryotic diversity, currently including 993 species from all major supergroups as well as orphan taxa. The goal of the database is to provide a single, convenient resource for gene-based research across the spectrum of eukaryotic life, such as phylogenomics and gene family evolution. Each species is placed within the UniEuk taxonomic framework in order to facilitate downstream analyses, and each data set is associated with a unique, persistent identifier to facilitate comparison and replication among analyses. The database is regularly updated, and all versions will be permanently stored and made available via FigShare. The current version has a number of updates, notably ‘The Comparative Set’ (TCS), a reduced taxonomic set with high estimated completeness while maintaining a substantial phylogenetic breadth, which comprises 196 predicted proteomes. A BLAST web server and graphical displays of data set completeness are available at http://evocellbio.com/eukprot/. We invite the community to provide suggestions for new data sets and new annotation features to be included in subsequent versions, with the goal of building a collaborative resource that will promote research to understand eukaryotic diversity and diversification.

    This release contains 5 files:

    EukProt_proteins.v03.2021_11_22.tgz: 993 protein data sets, for species with either a genome (375) or single-cell genome (56), a transcriptome (498), a single-cell transcriptome (47), or an EST assembly (17).

    EukProt_genome_annotations.v03.2021_11_22.tgz: gene annotations, in GFF format, as produced by EukMetaSanity (https://github.com/cjneely10/EukMetaSanity) for 40 genomes lacking publicly available protein annotations. The proteins predicted from these annotations are included in the proteins file.

    EukProt_included_data_sets.v03.2021_11_22.txt and EukProt_not_included_data_sets.v03.2021_11_22.txt: tables of information on data sets either included (993 data sets) or not included (163) in the database. Tab-delimited; multiple entries in the same cell are comma-delimited; missing data is represented with the “N/A” value. With the following columns:

    EukProt_ID: the unique identifier associated with the data set. This will not change among versions. If a new data set becomes available for the species, it will be assigned a new unique identifier.

    Name_to_Use: the name of the species for protein/genome annotation/assembled transcriptome files.

    Strain: the strain(s) of the species sequenced.

    Previous_Names: any previous names that this species was known by.

    Replaces_EukProt_ID/Replaced_by_EukProt_ID: if the data set changes with respect to an earlier version, the EukProt ID of the data set that it replaces (in the included table) or that it is replaced by (in the not_included table).

    Genus_UniEuk, Epithet_UniEuk, Supergroup_UniEuk, Taxogroup1_UniEuk, Taxogroup2_UniEuk: taxonomic identifiers at different levels of the UniEuk taxonomy (Berney et al. 2017, DOI: 10.1111/jeu.12414, based on Adl et al. 2019, DOI: 10.1111/jeu.12691).

    Taxonomy_UniEuk: the full lineage of the species in the UniEuk taxonomy (semicolon-delimited).

    Merged_Strains: whether multiple strains of the same species were merged to create the data set.

    Data_Source_URL: the URL(s) from which the data were downloaded.

    Data_Source_Name: the name of the data set (as assigned by the data source).

    Paper_DOI: the DOI(s) of the paper(s) that published the data set.

    Actions_Prior_to_Use: the action(s) that were taken to process the publicly available files in order to produce the data set in this database. Actions taken (see our manuscript for more details): ‘assemble mRNA’: Trinity v. 2.8.4, http://trinityrnaseq.github.io/ ‘CD-HIT’: v. 4.6, http://weizhongli-lab.org/cd-hit/ ‘extractfeat’, ‘seqret’, ‘transeq’, ‘trimseq’: from EMBOSS package v. 6.6.0.0, http://emboss.sourceforge.net/ ‘translate mRNA’: Transdecoder v. 5.3.0, http://transdecoder.github.io/ ‘gffread’: v.0.12.3 https://github.com/gpertea/gffread ‘predict genes’: EukMetaSanity https://github.com/cjneely10/EukMetaSanity (cloned on 21 September, 2021) All parameter values were default, unless otherwise specified.

    Data_Source_Type: the type of the source data (possible types: EST, transcriptome, single-cell transcriptome, genome, single-cell genome).

    Notes: additional information on the data set (including why it is replaced by/is replacing another data set, or why it was not included).

    Columns_Modified_Since_Previous_Version: column(s) in this file modified for the data set since the previous release. Not listed: modifications to the Notes column or to new columns added in this version.

    Alternative_Strain_Names: non-exhaustive list of alternative names for the sequenced strain for this data set.

    18S_Sequence_GenBank_ID: GenBank identifier for the strain sequenced in the data set. When multiple strains were sequenced, identifiers are separated with a comma, in the same order as the Strain column. Ranges of identifiers for the same strain are separated by a hyphen. ‘N/A’ indicates either that there is no GenBank sequence for the strain or that all available sequences are not full-length (< 1,500 bp).

    18S_Sequence: 18S for the strain derived from publicly available sequences associated with the data set, in the case where a GenBank sequence is not available.

    18S_Sequence_Source: the source for the sequence in the 18S_Sequence column, if any.

    18S_Sequence_Other_Strain_GenBank_ID: GenBank identifier for 18S sequence(s) from other strains of the same species as the data set.

    18S_Sequence_Other_Strain_Name: strain name(s) for the sequences in the 18S_Sequence_Other_Strain_GenBank_ID column.

    18S_and_Taxonomy_Notes: additional information on the values in the 18S_Sequence columns.

    Changes since version 2

    There are 324 new data sets included. 57 of these replace data sets from version 2.

    40 newly published data sets were added to the list that are not included in the database (annotated in the Notes column with the reasons they were not included).

    Instead of unannotated genomes (for published genomes lacking protein predictions), we now include predicted proteins and gene annotations (in GFF3 format).

    All sequences within each file are now assigned a standardized, unique identifier based on the data set’s EukProt_ID and on the type of data (protein or transcriptome). Illegal characters are removed from sequences.

    In the UniEuk_Taxonomy field, single quotes are now used instead of double quotes, to be consistent with other UniEuk databases (EukMap, EukRibo).

    Changes to metadata of individual data sets (in the included and not_included tables) with respect to the previous version are now listed in the Columns_Modified_Since_Previous_Version column.

    The Taxogroup_UniEuk column has been split into the Taxogroup1_UniEuk and Taxogroup2_UniEuk columns. This resulted in the Supergroup_UniEuk column changing for Opisthokonta.

    In addition, the following new columns have been added (see our manuscript for details): Alternative_Strain_Names, 18S_Sequence_GenBank_ID, 18S_Sequence, 18S_Sequence_Source, 18S_Sequence_Other_Strain_GenBank_ID, 18S_Sequence_Other_Strain_Name, 18S_and_Taxonomy_Notes.

    EukProt_assembled_transcriptomes.v03.2021_11_22.tgz: assembled transcriptome contigs, for 126 species with publicly available mRNA sequence reads but no publicly available assembly. The proteins predicted from these assemblies are included in the proteins file.

    Sequence names in the proteins and transcriptomes files have standardized, unique identifiers with the following format:

    [EukProt ID]_[Name_to_Use]_[Type abbreviation][Counter] [Previous header contents]

    Type abbreviations are P (protein) and T (transcriptome).

    All characters not in the following list are removed from nucleic acid sequences: ACGTNUKSYMWRBDHV All characters not in the the following list are removed from protein sequences: ABCDEFGHIKLMNPQRSTUVWYZX*

    Lists of legal characters are from: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=BlastHelp

  6. microbetag : building a thorough database of genome-scale KO annotations

    • zenodo.org
    application/gzip, bin +1
    Updated Jan 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haris Zafeiropoulos; Haris Zafeiropoulos (2024). microbetag : building a thorough database of genome-scale KO annotations [Dataset]. http://doi.org/10.5281/zenodo.10537295
    Explore at:
    zip, bin, application/gzipAvailable download formats
    Dataset updated
    Jan 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Haris Zafeiropoulos; Haris Zafeiropoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this repository we keep internal data for the microbetag microbial co-occurrence network annotator.

    microbetag makes use of 2-column files for each genome, indicating the KO term found and a KEGG module in which this terms takes part into.
    As a single KO term might participates in more than one KEGG modules, the same KO might be more than once in an annotation file.

    md5:e3e62b305e64b27da7b80655d7f92f2c

    for all the GTDB genomes their corresponding PATRIC annotations were gathered. Then, using modelseedpy we constructed their genome scale metabolic reconstructions

    md5:cbcc9aa1a28a5bd5f6661f832d27bcbf

    all representative genomes of GTDB (v.202) were parsed and their corresponding `.faa` files were retrieved from the NCBI FTP. Then the kofam_scan tool was used to annotate them and finally a manual script was used to keep KOs of each genome per module.

    updated_seedsets_of_interest.pckl
    md5:2a49534b52169daa4f96157b15d1b01c

    A pickle file with the seeds of each GEM included in the gtdb_modelseed_gems.zip file and related to the KEGG MODULES based on the seedId_keggId_module.tsv file you can find on microbetag's GitHub page. Example:

    PATRIC SeedSet
    373.172 [cpd00891, cpd00136, cpd00199, cpd01772, cpd00...
    397278.5 [cpd00891, cpd00136, cpd01772, cpd02698, cpd08...

    updated_non_seedsets_of_interest.pckl
    md5:75c435a69b183ea23aa58d43c1e051ba

    A pickle file with the non seeds of each GEM included in the gtdb_modelseed_gems.zip file and related to the KEGG MODULES based on the seedId_keggId_module.tsv file you can find on microbetag's GitHub page. Example:

    PATRIC NonSeedSet
    64187.548 [cpd00508, cpd00869, cpd00774, cpd03830, cpd00...
    74426.1719 [cpd00204, cpd00447, cpd20171, cpd03470, cpd00...

    md5:9e3f7a84fe7409ef0282ca5424797976
    A list of pickle files with the re-trained classes of phenDB for the prediction of functional traits on a genome.

  7. d

    Data from: Plant Expression Database

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Plant Expression Database [Dataset]. https://catalog.data.gov/dataset/plant-expression-database-8ddd3
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    [NOTE: PLEXdb is no longer available online. Oct 2019.] PLEXdb (Plant Expression Database) is a unified gene expression resource for plants and plant pathogens. PLEXdb is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data. PLEXdb (http://www.plexdb.org), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facilitate the interpretation of structure, function and regulation of genes in economically important plants. A list of Gene Atlas experiments highlights data sets that give responses across different developmental stages, conditions and tissues. Tools at PLEXdb allow users to perform complex analyses quickly and easily. The Model Genome Interrogator (MGI) tool supports mapping gene lists onto corresponding genes from model plant organisms, including rice and Arabidopsis. MGI predicts homologies, displays gene structures and supporting information for annotated genes and full-length cDNAs. The gene list-processing wizard guides users through PLEXdb functions for creating, analyzing, annotating and managing gene lists. Users can upload their own lists or create them from the output of PLEXdb tools, and then apply diverse higher level analyses, such as ANOVA and clustering. PLEXdb also provides methods for users to track how gene expression changes across many different experiments using the Gene OscilloScope. This tool can identify interesting expression patterns, such as up-regulation under diverse conditions or checking any gene’s suitability as a steady-state control. Resources in this dataset:Resource Title: Website Pointer for Plant Expression Database, Iowa State University. File Name: Web Page, url: https://www.bcb.iastate.edu/plant-expression-database [NOTE: PLEXdb is no longer available online. Oct 2019.] Project description for the Plant Expression Database (PLEXdb) and integrated tools.

  8. c

    Genomes To Fields 2014 v.1 - deprecated

    • datacommons.cyverse.org
    Updated 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carolyn Lawrence-Dill (2016). Genomes To Fields 2014 v.1 - deprecated [Dataset]. http://doi.org/10.7946/P25P4P
    Explore at:
    Dataset updated
    2016
    Dataset provided by
    CyVerse Data Commons
    Authors
    Carolyn Lawrence-Dill
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Data types in this directory tree are: hybrid and inbred agronomic and performance traits; inbred genotypic data; and environmental data collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development.

  9. d

    Genomes To Fields 2014

    • datasets.ai
    • agdatacommons.nal.usda.gov
    • +1more
    21
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Agriculture (2024). Genomes To Fields 2014 [Dataset]. https://datasets.ai/datasets/genomes-to-fields-2014-d3326
    Explore at:
    21Available download formats
    Dataset updated
    Mar 30, 2024
    Dataset authored and provided by
    Department of Agriculture
    Description

    Phenotypic, genotypic, and environment data for the 2014 field season: The data is stored in CyVerse.

    Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development.


    Resources in this dataset:

  10. c

    The Cancer Genome Atlas Prostate Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Feb 2, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2014). The Cancer Genome Atlas Prostate Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Feb 2, 2014
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  11. A framework for the estimation of the proportion of true discoveries in...

    • plos.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nik Tuzov (2023). A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data [Dataset]. http://doi.org/10.1371/journal.pone.0196058
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nik Tuzov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Any single nucleotide variant detection study could benefit from a fast and cheap method of measuring the quality of variant call list. It is advantageous to be able to see how the call list quality is affected by different variant filtering thresholds and other adjustments to the study parameters. Here we look into a possibility of estimating the proportion of true positives in a single nucleotide variant call list for human data. Using whole-exome and whole-genome gold standard data sets for training, we focus on building a generic model that only relies on information available from any variant caller. We assess and compare the performance of different candidate models based on their practical accuracy. We find that the generic model delivers decent accuracy most of the time. Further, we conclude that its performance could be improved substantially by leveraging the variant quality metrics that are specific to each variant calling tool.

  12. Genome indexes for Mus musculus (mm39)

    • zenodo.org
    • data-staging.niaid.nih.gov
    bin
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yazgeldi Gamze; Yazgeldi Gamze; Katayama Shintaro; Katayama Shintaro (2023). Genome indexes for Mus musculus (mm39) [Dataset]. http://doi.org/10.5281/zenodo.7457660
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yazgeldi Gamze; Yazgeldi Gamze; Katayama Shintaro; Katayama Shintaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BUILDING HISAT2 INDEXES IN CSC
    Here is the case for house mouse genome (mm39). The genome indexing step requires big memory and it might not be possible to carry out it on a laptop. Genome indexes for Mus musculus (mm39) were created using HISAT2 v2.2.1 on CSC (IT Center for Science), thanks to CSC-Puhti.

    1. Create conda environment folder file to install the required packages, install and add the bin directory to the path.
    mkdir STRTN-env
    conda-containerize new --prefix STRTN-env STRTN-env.yml
    export PATH="

    2. Load the required module.
    module load tykky
    export PATH="

    4. Extract splice sites and exons from a GTF file. Here we used wgEncodeGencodeBasicVM30 as the annotation file. You may additionally perform `hisat2_extract_snps_haplotypes_UCSC.py` to extract SNPs and haplotypes from a dbSNP file for human and mouse.
    wget https://hgdownload.soe.ucsc.edu/goldenPath/mm39/database/wgEncodeGencodeBasicVM30.txt.gz
    unpigz -c wgEncodeGencodeBasicVM30.txt.gz | hisat2_extract_splice_sites.py - | grep -v ^chrUn > splice_sites.txt
    unpigz -c wgEncodeGencodeBasicVM30.txt.gz | hisat2_extract_exons.py - | grep -v ^chrUn > exons.txt

    5. Build the HISAT2 index. This outputs a set of files with suffixes. Here, `mouse_reference.1.ht2`, `mouse_reference.2.ht2`, ..., `mouse_reference.8.ht2` are generated.
    In this case, `mouse_reference` is the basename used for `-i, --index`.

    hisat2-build mouse_reference.fasta --ss splice_sites.txt --exon exons.txt mouse_index/mouse_reference

    6. Create the sequence dictionary for the reference and Spike-in sequences. This is required for the Picard MergeBamAlignment program. Note that the original FASTA file (`mouse_reference.fasta` here) is also required.
    picard CreateSequenceDictionary R=mouse_reference.fasta O=mouse_reference.dict

    7. Put the genome indexes, genome fasta file, sequence dictionary to same folder.
    mv mouse_reference.dict mouse_reference
    mv mouse_reference.fasta mouse_reference

  13. s

    Kyoto Encyclopedia of Genes and Genomes Expression Database

    • scicrunch.org
    • rrid.site
    Updated Jun 10, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Kyoto Encyclopedia of Genes and Genomes Expression Database [Dataset]. http://identifiers.org/RRID:SCR_001120
    Explore at:
    Dataset updated
    Jun 10, 2018
    Description

    Database for mapping gene expression profiles to pathways and genomes. Repository of microarray gene expression profile data for Synechocystis PCC6803 (syn), Bacillus subtilis (bsu), Escherichia coli W3110 (ecj), Anabaena PCC7120 (ana), and other species contributed by the Japanese research community.

  14. c

    Genomes To Fields 2014 v.2 - deprecated

    • datacommons.cyverse.org
    Updated 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carolyn Lawrence-Dill (2016). Genomes To Fields 2014 v.2 - deprecated [Dataset]. http://doi.org/10.7946/P2201Q
    Explore at:
    Dataset updated
    2016
    Dataset provided by
    CyVerse Data Commons
    Authors
    Carolyn Lawrence-Dill
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Data types in this directory tree are: hybrid and inbred agronomic and performance traits; inbred genotypic data; and environmental data collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development.

  15. c

    Genomes To Fields Inbred Ear Imaging 2017

    • datacommons.cyverse.org
    Updated 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edgar Spalding (2017). Genomes To Fields Inbred Ear Imaging 2017 [Dataset]. http://doi.org/10.7946/P2C34P
    Explore at:
    Dataset updated
    2017
    Dataset provided by
    CyVerse Data Commons
    Authors
    Edgar Spalding
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development.

  16. c

    The Cancer Genome Atlas Lung Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 30, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2017). The Cancer Genome Atlas Lung Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.JGNIHEP5
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    Jan 30, 2017
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Lung Phenotype Research Group.

  17. Genomes To Fields Inbred Ear Imaging 2017 - Dataset - CyVerse Data Commons

    • ckan.cyverse.rocks
    Updated Jun 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.cyverse.rocks (2024). Genomes To Fields Inbred Ear Imaging 2017 - Dataset - CyVerse Data Commons [Dataset]. https://ckan.cyverse.rocks/dataset/genomes-to-fields-inbred-ear-imaging-2017
    Explore at:
    Dataset updated
    Jun 23, 2024
    Dataset provided by
    CKANhttps://ckan.org/
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development.

  18. f

    Data from: MOESM3 of Designing intracellular metabolism for production of...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Dec 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomokazu Shirai; Takashi Osanai; Akihiko Kondo (2016). MOESM3 of Designing intracellular metabolism for production of target compounds by introducing a heterologous metabolic reaction based on a Synechosystis sp. 6803 genome-scale model [Dataset]. http://doi.org/10.6084/m9.figshare.c.3602546_D2.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 14, 2016
    Dataset provided by
    figshare
    Authors
    Tomokazu Shirai; Takashi Osanai; Akihiko Kondo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file : Table S3. List of metabolites ignored when building the SyHyMeP.

  19. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  20. c

    The Cancer Genome Atlas Ovarian Cancer Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated May 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). The Cancer Genome Atlas Ovarian Cancer Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.NDO1MDFQ
    Explore at:
    n/a, dicomAvailable download formats
    Dataset updated
    May 29, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the TCGA Ovarian Phenotype Research Group.

  21. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Genomes To Fields (G2F) Inbred Ear Imaging Data 2017 [Dataset]. https://catalog.data.gov/dataset/genomes-to-fields-g2f-inbred-ear-imaging-data-2017-079c0

Data from: Genomes To Fields (G2F) Inbred Ear Imaging Data 2017

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description

A subset of ~30 inbreds were evaluated in 2014 and 2015 to develop an image based ear phenotyping tool. The data is stored in CyVerse. Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development. Resources in this dataset:Resource Title: CyVerse Genomes To Fields Inbred Ear Imaging 2017 dataset download. File Name: Web Page, url: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/Edgar_Spalding_G2F_Inbred_Ear_Imaging_June_2017 Dataset (csv, tar.gz) and metadata (BibTex/Endnote) downloads. See _readme.txt for file contents.

Search
Clear search
Close search
Google apps
Main menu