14 datasets found
  1. r

    LifeDB

    • rrid.site
    • dknet.org
    • +1more
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). LifeDB [Dataset]. http://identifiers.org/RRID:SCR_006899
    Explore at:
    Dataset updated
    Feb 9, 2025
    Description

    Database that integrates large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. LifeDB integrates data regarding full length cDNA clones and data on expression of encoded protein and their subcellular localization on mammalian cell line. LifeDB enables the scientific community to systematically search and select genes, proteins as well as cDNA of interest by specific database identifiers as well as gene name. It enables to visualize cDNA clone and subcellular location of proteins. It also links the results to external biological databases in order to provide a broader functional information. LifeDB also provides an annotation pipeline which facilitates an improved mapping of clones to known human reference transcripts from the RefSeq database and the Ensembl database. An advanced web interface enables the researchers to view the data in a more user friendly manner. Users can search using any one of the following search options available both in Search gene and cDNA clones and Search Sub-cellular locations of human proteins: By Keyword, By gene/transcript identifier, By plate name, By clone name, By cellular location. * The Search genes and cDNA clones results include: Gene Name, Ensemble ID, Genomic Region, Clone name, Plate name, Plate position, Classification class, Synonymous SNP''s, Non- synonymous SNP''s, Number of ambiguous positions, and Alignment with reference genes. * The Search sub-cellular locations of human proteins results include: Subcellular location, Gene Name, Ensemble ID, Clone name, True localization, Images, Start tag and End tag. Every result page has an option to download result data (excluding the microscopy images). On click of ''Download results as CSV-file'' link in the result page the user will be given a choice to open or save result data in form of a CSV (Comma Separated Values) file. Later the CSV file can be easily opened using Excel or OpenOffice.

  2. d

    Data from: Genomes To Fields (G2F) Inbred Ear Imaging Data 2017

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2024). Genomes To Fields (G2F) Inbred Ear Imaging Data 2017 [Dataset]. https://catalog.data.gov/dataset/genomes-to-fields-g2f-inbred-ear-imaging-data-2017-079c0
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    Agricultural Research Service
    Description

    A subset of ~30 inbreds were evaluated in 2014 and 2015 to develop an image based ear phenotyping tool. The data is stored in CyVerse. Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development. Resources in this dataset:Resource Title: CyVerse Genomes To Fields Inbred Ear Imaging 2017 dataset download. File Name: Web Page, url: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/Edgar_Spalding_G2F_Inbred_Ear_Imaging_June_2017 Dataset (csv, tar.gz) and metadata (BibTex/Endnote) downloads. See _readme.txt for file contents.

  3. Genomes To Fields 2014

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Mar 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2024). Genomes To Fields 2014 [Dataset]. https://catalog.data.gov/dataset/genomes-to-fields-2014-d3326
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    Phenotypic, genotypic, and environment data for the 2014 field season: The data is stored in CyVerse. Data types in this directory tree are: dimension and width profile data collected from scanned images of ears, cobs, and kernels collected from the Genomes To Fields (G2F) project cooperators. G2F is an umbrella initiative to support translation of maize (Zea mays) genomic information for the benefit of growers, consumers and society. This public-private partnership is building on publicly funded corn genome sequencing projects to develop approaches to understand the functions of corn genes and specific alleles across environments. Ultimately this information will be used to enable accurate prediction of the phenotypes of corn plants in diverse environments. There are many dimensions to the over-arching goal of understanding genotype-by-environment (GxE) interactions, including which genes impact which traits and trait components, how genes interact among themselves (GxG), the relevance of specific genes under different growing conditions, and how these genes influence plant growth during various stages of development. Resources in this dataset:Resource Title: CyVerse Genomes To Fields 2014 dataset download. File Name: Web Page, url: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/curated/Carolyn_Lawrence_Dill_G2F_Nov_2016_V.3 Dataset (csv, h5, gz) and metadata (BibTex/Endnote) downloads. See _readme.txt for file contents.

  4. MovieLens full 25-million recommendation data 🎬

    • kaggle.com
    Updated Apr 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iulia (2023). MovieLens full 25-million recommendation data 🎬 [Dataset]. https://www.kaggle.com/datasets/patriciabrezeanu/movielens-full-25-million-recommendation-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    iulia
    Description

    Summary This dataset (ml-25m) describes a 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 25000095 ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995, and November 21, 2019. This dataset was generated on November 21, 2019. Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided. The data are contained in the files genome-scores.csv, genome-tags.csv, links.csv, movies.csv, ratings.csv, and tags.csv. More details about the contents and use of all these files follow. This and other GroupLens data sets are publicly available for download at

  5. f

    Table_4_An integrative analysis of single-cell and bulk transcriptome and...

    • frontiersin.figshare.com
    txt
    Updated Dec 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong-Kai Cui; Chao-Jie Tang; Yu Gao; Zi-Ang Li; Jian Zhang; Yong-Dong Li (2023). Table_4_An integrative analysis of single-cell and bulk transcriptome and bidirectional mendelian randomization analysis identified C1Q as a novel stimulated risk gene for Atherosclerosis.csv [Dataset]. http://doi.org/10.3389/fimmu.2023.1289223.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Hong-Kai Cui; Chao-Jie Tang; Yu Gao; Zi-Ang Li; Jian Zhang; Yong-Dong Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe role of complement component 1q (C1Q) related genes on human atherosclerotic plaques (HAP) is less known. Our aim is to establish C1Q associated hub genes using single-cell RNA sequencing (scRNA-seq) and bulk RNA analysis to diagnose and predict HAP patients more effectively and investigate the association between C1Q and HAP (ischemic stroke) using bidirectional Mendelian randomization (MR) analysis.MethodsHAP scRNA-seq and bulk-RNA data were download from the Gene Expression Omnibus (GEO) database. The C1Q-related hub genes was screened using the GBM, LASSO and XGBoost algorithms. We built machine learning models to diagnose and distinguish between types of atherosclerosis using generalized linear models and receiver operating characteristics (ROC) analyses. Further, we scored the HALLMARK_COMPLEMENT signaling pathway using ssGSEA and confirmed hub gene expression through qRT-PCR in RAW264.7 macrophages and apoE-/- mice. Furthermore, the risk association between C1Q and HAP was assessed through bidirectional MR analysis, with C1Q as exposure and ischemic stroke (IS, large artery atherosclerosis) as outcomes. Inverse variance weighting (IVW) was used as the main method.ResultsWe utilized scRNA-seq dataset (GSE159677) to identify 24 cell clusters and 12 cell types, and revealed seven C1Q associated DEGs in both the scRNA-seq and GEO datasets. We then used GBM, LASSO and XGBoost to select C1QA and C1QC from the seven DEGs. Our findings indicated that both training and validation cohorts had satisfactory diagnostic accuracy for identifying patients with HPAs. Additionally, we confirmed SPI1 as a potential TF responsible for regulating the two hub genes in HAP. Our analysis further revealed that the HALLMARK_COMPLEMENT signaling pathway was correlated and activated with C1QA and C1QC. We confirmed high expression levels of C1QA, C1QC and SPI1 in ox-LDL-treated RAW264.7 macrophages and apoE-/- mice using qPCR. The results of MR indicated that there was a positive association between the genetic risk of C1Q and IS, as evidenced by an odds ratio (OR) of 1.118 (95%CI: 1.013–1.234, P = 0.027).ConclusionThe authors have effectively developed and validated a novel diagnostic signature comprising two genes for HAP, while MR analysis has provided evidence supporting a favorable association of C1Q on IS.

  6. GENCODE mouse & human transcript to gene files for txImport

    • zenodo.org
    • explore.openaire.eu
    bin, csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Turner; Stephen Turner (2020). GENCODE mouse & human transcript to gene files for txImport [Dataset]. http://doi.org/10.5281/zenodo.1324497
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stephen Turner; Stephen Turner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENCODE data downloaded for human (GRCh38p12 v28) and mouse (GRCm38p6 vM18). For each organism (human=v28, mouse=vM18), there is an R script with the commands necessary to generate the tx2gene CSV that can be read in and used with the tximport Bioconductor package for summarizing transcript abundances to the gene level.

    gencode.v28 = human

    gencode.vM18 = mouse

  7. d

    Supporting Data for \"Prediction of Causal Genes at GWAS Loci with...

    • search-demo.dataone.org
    • dataverse.no
    • +2more
    Updated Sep 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khan, Mariyam (2024). Supporting Data for \"Prediction of Causal Genes at GWAS Loci with Pleiotropic Gene Regulatory Effects Using Correlated Instrumental Variable Sets\". [Dataset]. http://doi.org/10.18710/VM0WKQ
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    DataverseNO
    Authors
    Khan, Mariyam
    Description

    This repository contains datasets associated with the manuscript titled "Prediction of Causal Genes at GWAS Loci with Pleiotropic Gene Regulatory Effects Using Correlated Instrumental Variable Sets." These datasets serve the purpose of supporting the development and validation of a Multivariable Mendelian Randomization (MVMR) method, a statistical technique using sets of genetic instruments (SNPs) to estimate the direct causal effects of multiple exposures (genes) on Coronary Artery Disease (CAD). The datasets aim to validate the Multivariable Mendelian Randomization (MVMR) method by utilizing summary statistics from Genome-Wide Association Studies (GWAS) on CAD and expression Quantitative Trait Loci (eQTL) analyses for gene expression data. The primary goal is to understand the genetic basis of CAD through pleiotropic gene regulatory effects. All files in this dataset have been generated from the following GWAS summary statistics and gene expression studies. GWAS Summary Data (ebi-a-GCST003116): - Trait: Coronary Artery Disease (CAD) - Association Analysis: Instruments (SNPs) to Outcome (CAD) - Year: 2015 - Population: European - Source: TwoSampleMR Package A GWAS (Genome-Wide Association Study) summary data file for a trait like Coronary Artery Disease (CAD) typically contains information about genetic variants across the entire genome and their associations with the trait of interest. Common components found in a GWAS summary data file include a SNP ID, which is a unique identifier for each genetic variant, often represented by a Single Nucleotide Polymorphism (SNP) ID. Additionally, the file contains the chromosome and position (genomic location of the variant on a specific chromosome), alleles associated with each variant, effect size (Beta or Odds Ratio), standard error of the effect size, p-value, minor allele frequency (MAF), and sample size. eQTL Analysis Summary Data: - Source: STARNET/GTEx - Association Analysis: Instruments (SNPs) to Exposures (Genes) - Validation Data: GTEx - Population: European-American subjects - Validation Data Download Link: GTEx Portal Nature and Scope: The eQTL Analysis Summary Data also contains information in the same format as the GWAS data file, focusing on the expression levels of genes across different tissues (atherosclerotic aortic root (Aor), blood, atherosclerotic-lesion-free internal mammary artery (Mam), subcutaneous fat (Sf), visceral abdominal fat (Vaf), skeletal muscle (Sklm), and liver (Liv)). Each entry includes the association between a genetic variant (SNP) and a gene identified using a Gene ID (e.g., Ensembl ID), with the effect size representing the magnitude and direction of the association. The dataset encompasses various files, including .csv files providing information on SNPs, gene expression, and outcome effects, and presenting the results of causal analyses. The dataset's primary focus lies in understanding the genetic basis of CAD through pleiotropic gene regulatory effects. Detailed files overview in 0_ReadMe.txt

  8. Pan-cancer Aberrant Pathway Activity Analysis (PAPAA)

    • zenodo.org
    • explore.openaire.eu
    application/gzip, csv +1
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI (2020). Pan-cancer Aberrant Pathway Activity Analysis (PAPAA) [Dataset]. http://doi.org/10.5281/zenodo.3630647
    Explore at:
    application/gzip, tsv, csvAvailable download formats
    Dataset updated
    Dec 5, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    DANIEL BLANKENBERG; DANIEL BLANKENBERG; VIJAY NAGAMPALLI; VIJAY NAGAMPALLI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information about the dataset files:

    1) pancan_rnaseq_freeze.tsv.gz: Publicly available gene expression data for the TCGA Pan-cancer dataset. File: PanCanAtlas EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611] [https://doi.org/10.1016/j.celrep.2018.03.046]

    2) pancan_mutation_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset. File: mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    3) pancan_GISTIC_threshold.tsv.gz: Publicly available Gene- level copy number information of the TCGA Pan-cancer dataset. This file is processed using script process_copynumber.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. The files copy_number_loss_status.tsv.gz and copy_number_gain_status.tsv.gz generated from this data are used as inputs in our Galaxy pipeline. [https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443] [https://doi.org/10.1016/j.celrep.2018.03.046]

    4) mutation_burden_freeze.tsv.gz: Publicly available Mutational information for TCGA Pan-cancer dataset mc3.v0.2.8.PUBLIC.maf.gz was processed using script process_sample_freeze.py by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/][http://api.gdc.cancer.gov/data/1c8cfe5f-e52d-41ba-94da-f15ea1337efc] [https://doi.org/10.1016/j.celrep.2018.03.046]

    5) sample_freeze.tsv or sample_freeze_version4_modify.tsv: The file lists the frozen samples as determined by TCGA PanCancer Atlas consortium along with raw RNAseq and mutation data. These were previously determined and included for all downstream analysis All other datasets were processed and subset according to the frozen samples.[https://github.com/greenelab/pancancer/]

    6) vogelstein_cancergenes.tsv: compendium of OG and TSG used for the analysis. [https://github.com/greenelab/pancancer/]

    7) CCLE_DepMap_18Q1_maf_20180207.txt.gz Publicly available Mutational data for CCLE cell lines from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2FCCLE_DepMap_18Q1_maf_20180207.txt]

    8) ccle_rnaseq_genes_rpkm_20180929.gct.gz: Publicly available Expression data for 1019 cell lines (RPKM) from Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://depmap.org/portal/download/api/download/external?file_name=ccle%2Fccle_2019%2FCCLE_RNAseq_genes_rpkm_20180929.gct.gz]

    9) CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct: Publicly available merged Mutational and copy number alterations that include gene amplifications and deletions for the CCLE cell lines. This data is represented in the binary format and provided by the Broad Institute Cancer Cell Line Encyclopedia (CCLE) / DepMap Portal. [https://data.broadinstitute.org/ccle_legacy_data/binary_calls_for_copy_number_and_mutation_data/CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct]

    10) GDSC_cell_lines_EXP_CCLE_names.csv.gz Publicly available RMA normalized expression data for Genomics of Drug Sensitivity in Cancer(GDSC) cell-lines. File gdsc_cell_line_RMA_proc_basalExp.csv was downloaded. This data was subsetted to 389 cell lines that are common among CCLE and GDSC. All the GDSC cell line names were replaced with CCLE cell line names for further processing. [https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Data/preprocessed/Cell_line_RMA_proc_basalExp.txt.zip]

    11) GDSC_CCLE_common_mut_cnv_binary.csv.gz: A subset of merged Mutational and copy number alterations that include gene amplifications and deletions for common cell lines between GDSC and CCLE. This file is generated using CCLE_MUT_CNA_AMP_DEL_binary_Revealer.gct and a list of common cell lines.

    12) gdsc1_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC1 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC1_fitted_dose_response_15Oct19.xlsx]

    13) gdsc2_ccle_pharm_fitted_dose_data.txt.gz: Pharmacological data for GDSC2 cell lines. [ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/current_release/GDSC2_fitted_dose_response_15Oct19.xlsx]

    14) compounds.csv: list of pharmacological compounds tested for our analysis

    15) tcga_dictonary.tsv: list of cancer types used in the analysis.

    16) seg_based_scores.tsv: Measurement of total copy number burden, Percent of genome altered by copy number alterations. This file was used as part of the Pancancer analysis by Gregory Way et al as described in https://github.com/greenelab/pancancer/ data processing and initialization steps. [https://github.com/greenelab/pancancer/]

    17) sign.csv: file with original values assigned for tumor [1] or normal [-1] for given external samples (GSE69822)

    18) vlog_trans.csv: variant stabilized log transformed expression values for given external samples (GSE69822)

    19 path_genes.csv: file with list of ERK/RAS/PI3K pathway genes used in the analysis.

  9. Z

    Zebrafish Pathway Metabolite MetFrag Local CSV (Beta)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marek Ostaszewski (2021). Zebrafish Pathway Metabolite MetFrag Local CSV (Beta) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3457553
    Explore at:
    Dataset updated
    Feb 25, 2021
    Dataset provided by
    Marek Ostaszewski
    Schymanski, Emma
    Egon Willighagen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a local CSV file of Zebrafish metabolites extracted from wikipathways and KEGG for MetFrag (https://msbi.ipb-halle.de/MetFrag/) and literature (currently DOI: 10.1371/journal.pone.0213661).

    All metabolites associated with wikipathways entries were extracted by Egon Willighagen using the following query:

    http://sparql.wikipathways.org/sparql?default-graph-uri=&query=PREFIX+gpml%3A++++%3Chttp%3A%2F%2Fvocabularies.wikipathways.org%2Fgpml%23%3E%0D%0APREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+dc%3A++++++%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0APREFIX+rdf%3A+++++%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E+%0D%0A%0D%0Aselect+distinct+%3Fmetabolite+%28str%28%3FtitleLit%29+as+%3Ftitle%29+where+%7B%0D%0A++%3Fmetabolite+a+wp%3AMetabolite+%3B%0D%0A++++dcterms%3AisPartOf+%3Fpw+.%0D%0A++%3Fpw+dc%3Atitle+%3FtitleLit+%3B%0D%0A++++wp%3AorganismName+%22Danio+rerio%22%5E%5Exsd%3Astring+.%0D%0A%7D&format=text%2Fhtml&timeout=0&debug=on

    Open pathway data from the KEGG Zebrafish pathway (https://www.genome.jp/kegg-bin/show_pathway?dre01100) was extracted by Marek Ostaszewski.

    The resulting identifiers were mapped to chemical structures using the Chemical Translation Service (https://cts.fiehnlab.ucdavis.edu/batch), the CompTox Chemistry Dashboard batch search (https://comptox.epa.gov/dashboard/dsstoxdb/batch_search) and individual resources (KEGG, ChEBI, HMDB) where applicable. Conversions between structural formats were performed using Open Babel (http://openbabel.org/wiki/Main_Page). 13 Nov update: DOI: 10.1371/journal.pone.0213661 was added and mapped up using PubChem services (https://pubchem.ncbi.nlm.nih.gov/); PubChem CIDs and two additional columns "annothits" and "annothitcnt" from the PubChem download files were added.

    This file is a BETA version to start a collection of Zebrafish metabolites for identification using MetFrag CL workflows (offline), this file will be integrated into MetFrag online; please use the file in the dropdown menu rather than uploading this one.

    TODO: to add PubChem "Zebrafish" query: https://pubchem.ncbi.nlm.nih.gov/#query=zebrafish&tab=pathway&selected_id_type=cid

  10. d

    Data from: Unveiling the impacts of land use on the phylogeography of...

    • search.dataone.org
    • datadryad.org
    Updated Feb 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Ernesto GarcÃa Peña; André VÃctor Rubio (2024). Unveiling the impacts of land use on the phylogeography of zoonotic New World Hantaviruses [Dataset]. http://doi.org/10.5061/dryad.rv15dv4fq
    Explore at:
    Dataset updated
    Feb 24, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Gabriel Ernesto García Peña; André Víctor Rubio
    Time period covered
    Jan 1, 2024
    Description

    Billions of genomic sequences are stored in public repositories (NCBI) as well as records of species occurrence (GBIF). By implementing analytical tools from different scientific disciplines, data mining on these databases can be a source of information to aid in the global surveillance of zoonotic pathogens that circulate among wildlife. We illustrate this by investigating the hantavirus-rodent system in the Americas, i.e. New World Hantaviruses (NWH). First we draw the circulation of pathogenic NWH among rodents; by inferring the phylogenetic links among 278 genomic samples of the S segment (N protein) of NWH found in 55 species of Cricetidae rodents. Second, machine learning was used to assess the impact of land use on the probability of presence of the rodent species linked with reservoirs of pathogenic hantaviruses. Our results show that hosts are widely present across the Americas. Some hosts are present in the primary forest and agricultural land, but not in the secondary forest;..., Data analysis follows 4 main steps:

    Data Collection and Curation. GenBank Accesion Numbers of Hantavirus sequences were obtained from a BLAST query, metadata was collected, taxonomic names homogenized, and sequences found in wild animals were selected.

    Genetic Sequence Alignment and Phylogenetic Inference. Genetic data was aligned and used to infer the phylogenetic relationships among the samples.

    Phylogenetic Network analysis on the genetic links of Hantaviruses among hosts. Phylogenetic network was built from the phylogentic tree of New World Hantavirus.

    Geographic analysis on the habitat suitability of hosts linked in the phylogenetic network. Habitat suitability within the distribution areas of each species was modeled with classification trees. Historical records on the species presence were used to assess the land use change in the time of sampling, and train a model to predict the presence of the species based on 12 land use variables. These models were used to predict the a..., , ## Unveiling the Impacts of Land Use on the Phylogeography of Zoonotic New World Hantaviruses

    Gabriel E GarcÃa-Peña and André V. Rubio. Ecography 2024. DOI: 10.1111/ecog.06996

    Supplementary Material

    Description of the data and file structure

    Analysis presented in the main article was performed in R (R Core Team 2022); MAFFT (Katoh 2005) and JModelTest2 (Darriba et al. 2012), following 4 main steps:

    1. Data Collection and Curation.

    BLAST_Nprot.csv : Accession numbers from the BLAST search for Hanatavirus. With this list of accesion numbers, it is possible to download the genetic sequences in R, by using the function read.GenBank() from the library ape. Metadata of these sequences can be accesed with the R code presented in the file: fetch.metadata.R (see code section).

    2. Genetic Sequence Alignment and Phylogenetic Inference.

    Nprot_MaxAlign.fas: Fasta file with Multiple sequence alignment of the genetic sequences. Fasta file can be read in R...

  11. Phylo-k-mers databases for SHERPAS

    • zenodo.org
    • datadryad.org
    zip
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Scholz; Guillaume Scholz; Benjamin Linard; Nokolai Romashchenko; Eric Rivals; Fabio Pardi; Benjamin Linard; Nokolai Romashchenko; Eric Rivals; Fabio Pardi (2022). Phylo-k-mers databases for SHERPAS [Dataset]. http://doi.org/10.5061/dryad.r7sqv9s85
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillaume Scholz; Guillaume Scholz; Benjamin Linard; Nokolai Romashchenko; Eric Rivals; Fabio Pardi; Benjamin Linard; Nokolai Romashchenko; Eric Rivals; Fabio Pardi
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    SHERPAS is a new program to identify novel recombinant sequences in a large collection of viral sequences, and to provide a first estimate of their recombinant structure. SHERPAS is much faster than other softwares for recombination detection; its main feature is the use of a pre-computed database of "phylogenetically-informed k-mers" (or phylo-k-mers). The computation of this phylo-k-mer database is a heavy computational step, but it only needs to be executed once for a given reference alignment.

    A phylo-k-mer database can be built from any reference alignment, and a phylogenetic tree built from that alignment, using RAPPAS2 (https://github.com/phylo42/rappas2). We propose here three ready-to-use databases, for three reference alignments:
    -An alignment of 167 sequences of the pol region of the HIV genome, provided with the program SCUEAL, accessible at https://github.com/spond/SCUEAL/blob/master/data/pol2009.nex
    -An alignment of 339 sequence of the whole HBV genome, provided with the programm jpHMM, accessible at http://jphmm.gobics.de/download.html.
    -An alignment of 881 sequences of the whole HIV genome, also provided with jpHMM, accessible at http://jphmm.gobics.de/download.html.

    For each of these alignments, we provide a .zip file containing three files: The phylo-k-mer database (.rps file), the reference phylogenetic tree used to build the database (.tree file), and a table associating each reference sequence to a strain of the virus (.csv file). The details of the construction of the database, the construction of the tree, as well as the origin of the information reported in the table, can be found in the Supplementary Materials associated with the original Bioinformatics publication.

  12. n

    Simulating population divergence of Northern chamois in the Alps based on...

    • cmr.earthdata.nasa.gov
    Updated Sep 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Simulating population divergence of Northern chamois in the Alps based on habitat dynamics [Dataset]. http://doi.org/10.16904/envidat.291
    Explore at:
    Dataset updated
    Sep 27, 2023
    Time period covered
    Jan 1, 2022
    Area covered
    Description

    General description Genomic data, habitat suitability raster files and scripts to run gen3sis to simulate cumulative divergence over time as approximation for genetic differentiation. Scripts for basic analysis of the simulations (e.g., create distance matrix from sampling locations) are provided, too. See original publication (doi link will be provided after publication) for details. The study area are the European Alps. All data is uploaded as zipped file. Unzip them after the download and put all data in one folder. See linked publications for correct citation of the data used, use of the data without correct citation is not allowed. Corresponding author: Flurin Leugger, email: flurin.leugger@gmail.com # Description of the data (content of the different zip folders) ## Abiotic data ### Glaciers Folders with raster stacks with glaciated areas at 0.05° resolution in WGS84 projection from Seguinot et al. (2018). Seguinot, J., Ivy-Ochs, S., Jouvet, G., Huss, M., Funk, M., & Preusser, F. (2018). Modelling last glacial cycle ice dynamics in the Alps. The Cryosphere, 12(10), 3265–3285. https://doi.org/10.5194/tc-12-3265-2018 ### Rivers * river_raster_elevation_class.tif: raster file (.tif) at 0.05° resolution and WGS84 projection with large rivers (scenario 2 from publication). The rivers (each cell) is classified according to the elevation of the cell. Natural Earth. (2018). Rivers + lake centerlines version 4.1.0. Retrieved January 22, 2020, from https://www.naturalearthdata.com/downloads/50m-physical-vectors/50m-rivers-lake-centerlines * river_raster_strahler_class_5km.tif: raster file at 0.05° resolution and WGS84 projection with medium rivers. The rivers are classified according to their Strahler order. Food and Agriculture Organization of the United Nations. (2014). Rivers in Europe (Derived from HydroSHEDS). Retrieved January 29, 2020, from http://www.fao.org/geonetwork/srv/fr/google.kml?uuid=e0243940-e5d9-487c-8102-45180cf1a99f&layers=AQUAMAPS:37253_rivers_europe ## Fossil records * chamois_fossil_combined_public.xlsx: list with fossil records until 20,000 years BP from Central Europe, see linked references for citation. ## Chamois occurrences * chamois_occurrence.csv: Chamois presences from all sources used for the publication (see Suppl. mat. Table S1 for detailed information and correct citations of the data) aggregated at 0.05° resolution (~5km). ## Gen3sis * config: folders with all configuration files used to run the simulations for the publication (different dispersal divergence parameters). * scripts: scripts (and helper functions) to run the gen3sis simulations including scripts for the beginning of the subsequent analysis. ## Genetic * populations.snps.light.vcf: vcf file of the sampled Northern chamois (Rupicapra rupicapra) . The genomic data encompasses 20k SNPs (from ddRAD sequencing). * Sequencing_final_without_slovakia.txt: sampling locations of Northern chamois (Rupicapra rupicapra) ## HSM * habitat_suitability_hindcasting: Aggregated habitat suitability raster files (stacks, .grd files) at 0.05° resolution and WGS84 projection from 20,000 years BP until today in 100 year time steps. There are separate folders for each environmental variable scenario used (different terrain slope variables) an the different occurrence/pseudo-absence sampling strategy used. * ODMAP_LeuggerEtAl2021-10-25.csv_: ODMAP protocol

  13. Z

    Data from: Locally adaptive temperature response of vegetative growth in...

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gunis, Joanna (2022). Locally adaptive temperature response of vegetative growth in Arabidopsis thaliana [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6076947
    Explore at:
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Nordborg, Magnus
    Jez, Jakub
    Nizhynska, Viktoria
    Gunis, Joanna
    Clauw, Pieter
    Reichardt, Ilka
    Koemeda, Stefanie
    Kerdaffrec, Envel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We investigated early vegetative growth of natural Arabidopsis thaliana accessions in cold, non-freezing temperatures, similar to temperatures these plants naturally encounter in fall at northern latitudes.

    Dataset includes: - rosette area measurements over 3 weeks in a 16ÂșC and a 6ÂșC treatment. First phenoptying time point is at 14 days after stratification. Measurements were take twice per day. These data are in file rawdata_combined_annotation.txt and go together with outliers.csv, which contains outlying datapoints.

    • Seed Size measurements. These data are in file seed_size_swedes_lab_updated.csv

    The remainnig files are required to rerun the analyses and recreate figures. Scripts to do so can be found in https://github.com/picla/growth_16C_6C/

    1001genomes-accessions.csv: lists all accession from the 1001genomes project and their respective subpopulations.

    2029_modified_MN_SH_wc2.0_30s_bilinear.csv: contains climate data for each accession, downloaded and prcocessed from www.worldclim.org

    metabolic_distance.csv: contains the metabolic distance as calculated in Weiszmann et al. (https://www.biorxiv.org/content/10.1101/2020.09.24.311092v1)

    RNAseq_samples.txt: sample description of the RNA-seq samples (data is downloadable from http://www.ncbi.nlm.nih.gov/bioproject/807069)

    ZAT12_downregulated_table10.csv, ZAT12_upregulated_table9.csv, CBF_regulon_DOWN_ParkEtAl2015.txt, CBF_regulon_UP_ParkEtAl2015.txt, CBF2_downregulated_table8.csv, CBF2_upregulated_table7.csv, HSFC1_regulon_ParkEtAl2015.txt: these files list genes that are involve din cold acclimation as described by Park et al. (https://onlinelibrary.wiley.com/doi/10.1111/tpj.12796), and Vogel et al.(https://onlinelibrary.wiley.com/doi/10.1111/j.1365-313X.2004.02288.x).

    Material and Methods

    Rosette growth

    Seeds of 249 natural accessions (Suppl. Data 1) of Arabidopsis thaliana described in the 1001 genomes project (1001 Genomes Consortium 2016) were sown on sieved (6 mm) substrate (Einheitserde ED63). Pots were filled with 71.5 g ±1.5 g of soil to assure homogenous packing. The prepared pots were all covered with blue mats (Junker et al. 2014) to enable a robust performance of the high-throughput image analysis algorithm. Seeds were stratified (4 days at 4ÂșC in darkness) after which they germinated and left to grow for 2 weeks at 21ÂșC (relative humidity: 55 %; light intensity: 160 ”mol m-2 s-1; 14 h light). The temperature treatments were started by transferring the seedlings to either 6 °C or 16 °C. To simulate natural conditions temperatures fluctuated diurnally between 16-21 °C, 0.5-6 °C and 8-16 °C for the 21 °C initial growth conditions and the 6 °C and 16 °C treatments, respectively (Fig.2). Light intensity was kept constant at 160 ”mol m-2 s-1 throughout the experiment. Relative humidity was set at 55% but in colder temperatures it rose uncontrollably to maximum 95%. Daylength was 9h during the 16°C and 6°C treatments.

    Each temperature treatment was repeated in three independent experiments. Five replicate plants were grown for every genotype per experiment. Plants were randomly distributed across the growth chamber with an independent randomisation pattern for each experiment. During the temperature treatments (14 DAS – 35 DAS), plants were photographed twice a day (1 hour. after/before lights switched on/off), using an RGB camera (IDS uEye UI-548xRE-C; 5MP) mounted to a robotic arm. At 35 DAS, whole rosettes were harvested, immediately frozen in liquid nitrogen and stored at -80 °C until further analysis. Rosette areas were extracted from the plant images using Lemnatec OS (LemnaTec GmbH, Aachen, Germany) software.

    Seed size

    We used the seeds produced by (Kerdaffrec et al. 2016) and limited our measurements to the set of 123 Swedish accessions that overlapped with our growth dataset. After seed stratification for four days at 4ÂșC in darkness, mother plants were grown for 8 weeks at 4ÂșC under long-day conditions (16h light; 8h dark) to ensure proper vernalization. Temperature was raised to 21ÂșC (light) and 16ÂșC (dark) for flowering and seed ripening. Seeds were kept in darkness at 16ÂșC and 30% relative humidity, from the harvest until seed size measurements. For each genotype three replicates were pooled and about 200-300 seeds were sprinkled on 12 x 12 cm square, transparent Petri dishes. Image acquisition was performed as described in (Exposito-Alonso et al. 2018) by scanning dishes on a cluster of eight Epson V600 scanners. The resulting 1200 dpi .tiff images were analyzed in the Fiji software. Images were converted to 8-bit binary images and thresholded with the setAutoThreshold("Defaultdark”) command, and seed area was measured in squared mm by running the Analyse Particles command (inclusion parameters: size=0.04-0.25 circularity=0.70-1.00).

  14. E

    Data from: A taxonomic, genetic and ecological data resource for the...

    • catalogue.ceh.ac.uk
    • data-search.nerc.ac.uk
    • +1more
    zip
    Updated Sep 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M.C. Henniges; R.F. Powell; S. Mian; C.A. Stace; K.J. Walker; R.J. Gornall; M.J.M. Christenhusz; M.R. Brown; A.D. Twyford; P.M. Hollingsworth; L. Jones; N. De Vere; A. Antonelli; A.R. Leitch; I.J. Leitch (2021). A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland [Dataset]. http://doi.org/10.5285/9f097d82-7560-4ed2-af13-604a9110cf6d
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 20, 2021
    Dataset provided by
    NERC EDS Environmental Information Data Centre
    Authors
    M.C. Henniges; R.F. Powell; S. Mian; C.A. Stace; K.J. Walker; R.J. Gornall; M.J.M. Christenhusz; M.R. Brown; A.D. Twyford; P.M. Hollingsworth; L. Jones; N. De Vere; A. Antonelli; A.R. Leitch; I.J. Leitch
    Area covered
    Dataset funded by
    Natural Environment Research Councilhttps://www.ukri.org/councils/nerc
    Description

    The dataset contains a current inventory of vascular plant species and their attributes present in the flora of Britain and Ireland. The species list is based on the most recent key to the flora of Britain and Ireland, with taxon names linked to unique Kew taxon identifiers and the World Checklist of Vascular Plants, and includes both native and non-native species. Attribute data stem from a variety of sources to give an overview of the current state of the vascular flora. Attributes include functional traits, distribution and ecologically relevant data (e.g. genome size, chromosome numbers, spatial distribution, growth form, hybridization metrics and native/non-native status). The data include previously unpublished genome size measurements, chromosome counts and CSR life strategy assessments. The database aims to provide an up-to-date starting point for flora-wide analyses.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). LifeDB [Dataset]. http://identifiers.org/RRID:SCR_006899

LifeDB

RRID:SCR_006899, nif-0000-03081, LifeDB (RRID:SCR_006899), LifeDB

Explore at:
141 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 9, 2025
Description

Database that integrates large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. LifeDB integrates data regarding full length cDNA clones and data on expression of encoded protein and their subcellular localization on mammalian cell line. LifeDB enables the scientific community to systematically search and select genes, proteins as well as cDNA of interest by specific database identifiers as well as gene name. It enables to visualize cDNA clone and subcellular location of proteins. It also links the results to external biological databases in order to provide a broader functional information. LifeDB also provides an annotation pipeline which facilitates an improved mapping of clones to known human reference transcripts from the RefSeq database and the Ensembl database. An advanced web interface enables the researchers to view the data in a more user friendly manner. Users can search using any one of the following search options available both in Search gene and cDNA clones and Search Sub-cellular locations of human proteins: By Keyword, By gene/transcript identifier, By plate name, By clone name, By cellular location. * The Search genes and cDNA clones results include: Gene Name, Ensemble ID, Genomic Region, Clone name, Plate name, Plate position, Classification class, Synonymous SNP''s, Non- synonymous SNP''s, Number of ambiguous positions, and Alignment with reference genes. * The Search sub-cellular locations of human proteins results include: Subcellular location, Gene Name, Ensemble ID, Clone name, True localization, Images, Start tag and End tag. Every result page has an option to download result data (excluding the microscopy images). On click of ''Download results as CSV-file'' link in the result page the user will be given a choice to open or save result data in form of a CSV (Comma Separated Values) file. Later the CSV file can be easily opened using Excel or OpenOffice.

Search
Clear search
Close search
Google apps
Main menu