5 datasets found
  1. Tools and methods in genomic data analysis: TGAC - Repositive Main Survey...

    • figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte Whicher; Jessica Jordan (2023). Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.4715302.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Charlotte Whicher; Jessica Jordan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Final results from the preliminary survey found here: https://figshare.com/articles/TGAC_-_Repositive_Preliminary_Survey_Results/3503873After that preliminary survey we added some additional questions to gain further insights and then opened the survey up to a wider audience. 50 people responded and in the blog post I will discuss our findings from this survey and our final conclusions.

  2. c

    The Cancer Genome Atlas Rectum Adenocarcinoma Collection

    • cancerimagingarchive.net
    dicom, n/a
    Updated Jan 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
    Explore at:
    dicom, n/aAvailable download formats
    Dataset updated
    Jan 5, 2016
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    May 29, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).

    Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.

    CIP TCGA Radiology Initiative

    Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.

  3. b

    Data from: An annotated draft genome of the mountain hare (Lepus timidus)

    • nde-dev.biothings.io
    • datasetcatalog.nlm.nih.gov
    • +4more
    zip
    Updated Dec 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Pedro Marques; Fernando A. Seixas; Jeffrey M. Good; Liliana Farelo; Colin M. Callahan; W. Ian Montgomery; Neil Reid; Paulo C. Alves; Pierre Boursot; José Melo-Ferreira (2020). An annotated draft genome of the mountain hare (Lepus timidus) [Dataset]. http://doi.org/10.5061/dryad.j0zpc86bk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 18, 2020
    Dataset provided by
    Queen's University Belfast
    University of Montana
    Universidade do Porto
    Institut des Sciences de l'Evolution de Montpellier
    Authors
    João Pedro Marques; Fernando A. Seixas; Jeffrey M. Good; Liliana Farelo; Colin M. Callahan; W. Ian Montgomery; Neil Reid; Paulo C. Alves; Pierre Boursot; José Melo-Ferreira
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares.

    Methods DNA Sampling, Extraction, and Sequencing

    One female mountain hare (Lepus timidus hibernicus) specimen (NCBI BioSample ID SAMN12621015) was captured from the wild for scientific research purposes by the Irish Coursing Club (ICC) at Borris-in-Ossory, County Laois under National Parks & Wildlife (NPWS) license no. C 337/2012 issued by the Department of Arts, Heritage and the Gaeltacht (dated October 31, 2012). Genomic DNA was extracted from kidney, muscle, and ear tissue using the JETquick Tissue DNA Spin Kit (GENOMED), with RNAse and proteinase K to remove RNA and protein contamination. Genomic libraries of different insert lengths were generated following the standard ALLPATHS-LG protocol (Gnerre et al. 2011): one Illumina TruSeq DNA library of 180 bp fragments was sequenced with overlapping paired-end (OPE) reads, and three Illumina TruSeq DNA mate-pair (MP) libraries of 2.5, 4.5, and 8.0 kb insert sizes. Whole-genome sequencing was performed at The Genome Analysis Center (TGAC, currently Earlham Institute, Norwich, UK)—seven HiSeq2000 lanes (five OPE and two 4.5 kb MP)—and CIBIO’s New-Gen sequencing platform—three HiSeq1500 lanes (2.5 and 8.0 kb MP). Raw sequencing reads were deposited in the Sequence Read Archive.

    Genome Assembly

    De novo assembly was performed using ALLPATHS-LG (Gnerre et al. 2011) with default parameters using OPE and mate-pair reads. The resulting assembly was evaluated with REAPR v1.0.18 (Hunt et al. 2013) to break incorrect scaffolds, by mapping the paired-end and the 4.5 kb mate-pair reads on the assembled genome. Another round of scaffolding was then performed using SSPACE v3.0 (Boetzer et al. 2011), with a minimum overlap of 32 bp and supported by a minimum of 20 reads (CIBIO-ISEM_LeTim1.0_Assembled.fasta.gz).

    Finally, we leveraged the existence of the high-quality assembly of the genome of the European rabbit (Oryctolagus cuniculus—Ensembl OryCun2.0), to improve the contiguity of the assembly using the reference-based scaffolder MeDuSa v.1.6 with five iterations (Bosi et al. 2015) (CIBIO-ISEM_LeTim1.0_re-scaffolded.fasta.gz).

    This re-scaffolding orders and re-orientates scaffolds without affecting intra-scaffold sequence. Quality control of the assembly at different stages was assessed based on metrics obtained with QUAST v.3.2 (Mikheenko et al. 2016). The completeness of the L. timidus re-scaffolded genome was evaluated using BUSCO v.3.0.2 (Simão et al. 2015), based on the presence and absence of core single-copy genes (from mammalia_odb9 database). We then checked consistency of gene content in the larger chromosome-like scaffolds and rabbit chromosomes using blastp from NCBI BLAST v2.7.1+ (Camacho et al. 2009), considering the best hit per gene with similarity above 90% over 500 bp. The 22 rabbit chromosomes were aligned against inferred corresponding L. timidus re-scaffolded scaffolds using D-Genies v. 1.2.0 Mashmap (Cabanettes and Klopp 2018).

    Genome Annotation

    Repetitive regions were identified using RepeatModeler v.1.0.11 (Smit and Hubley 2008) and masked using RepeatMasker v.4.0.7 (Smit et al. 2013). The masked genome was used as input for gene prediction in MAKER v.3.01.02 (Cantarel et al. 2008), using ab-initio predictions, L. timidus transcriptome data, and rabbit protein annotations (O. cuniculus) - (CIBIO-ISEM_LeTim1.0.cdna.abinitio.fa.gz and CIBIO-ISEM_LeTim1.0.pep.abinitio.fa.gz). Functional inference for genes and transcripts was performed using the translated CDS features of each coding transcript. Each predicted protein sequence was based on blastp searches against the Uniprot-Swissprot database to retrieve gene name and function, and InterProscan v5.30-69 (Jones et al. 2014) to retrieve Interpro, Pfam v31.0 (Finn et al. 2016), GO (Mi et al. 2017), KEGG (Kanehisa et al. 2016), and Reactome (Fabregat et al. 2018) information (annotation files: CIBIO-ISEM_LeTim1.0_re-scaffolded.gff.gz and CIBIO-ISEM_LeTim1.0_re-scaffolded.id.map.gz).

  4. Data from: Aequatus: An open-source homology browser

    • ckan.earlham.ac.uk
    Updated Jun 2, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.earlham.ac.uk (2019). Aequatus: An open-source homology browser [Dataset]. https://ckan.earlham.ac.uk/dataset/fc5855ca-bfc7-4f70-82e2-53d5f505f1dc
    Explore at:
    Dataset updated
    Jun 2, 2019
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. We present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow. Aequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus A demo server is available at http://aequatus.earlham.ac.uk/ A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://usegalaxy.eu

  5. w

    Spend at The Genome Analysis Centre Norwich

    • data.wu.ac.at
    • data.europa.eu
    csv
    Updated Feb 22, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Genome Analysis Centre (2014). Spend at The Genome Analysis Centre Norwich [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/OGMzYmRlMTUtMDNiNy00NDI1LTliY2UtMWQyZWVlYjRlZDA5
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 22, 2014
    Dataset provided by
    The Genome Analysis Centre
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    A monthly-updated list of all financial transactions spending over £500 made by The Genome Analysis Centree, as part of the Government's commitment to transparency in expenditure.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Charlotte Whicher; Jessica Jordan (2023). Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.4715302.v1
Organization logoOrganization logo

Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results

Explore at:
xlsxAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Charlotte Whicher; Jessica Jordan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Final results from the preliminary survey found here: https://figshare.com/articles/TGAC_-_Repositive_Preliminary_Survey_Results/3503873After that preliminary survey we added some additional questions to gain further insights and then opened the survey up to a wider audience. 50 people responded and in the blog post I will discuss our findings from this survey and our final conclusions.

Search
Clear search
Close search
Google apps
Main menu