5 datasets found

Tools and methods in genomic data analysis: TGAC - Repositive Main Survey...
figshare.com
xlsx
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charlotte Whicher; Jessica Jordan (2023). Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.4715302.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4715302.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Charlotte Whicher; Jessica Jordan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Final results from the preliminary survey found here: https://figshare.com/articles/TGAC_-_Repositive_Preliminary_Survey_Results/3503873After that preliminary survey we added some additional questions to gain further insights and then opened the survey up to a wider audience. 50 people responded and in the blog post I will discuss our findings from this survey and our final conclusions.
c
The Cancer Genome Atlas Rectum Adenocarcinoma Collection
cancerimagingarchive.net
dicom, n/a
Updated Jan 5, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2016). The Cancer Genome Atlas Rectum Adenocarcinoma Collection [Dataset]. http://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Explore at:
dicom, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/K9/TCIA.2016.F7PPNPNU
Dataset updated
Jan 5, 2016
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 29, 2020
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). Clinical, genetic, and pathological data resides in the Genomic Data Commons (GDC) Data Portal while the radiological data is stored on The Cancer Imaging Archive (TCIA).
Matched TCGA patient identifiers allow researchers to explore the TCGA/TCIA databases for correlations between tissue genotype, radiological phenotype and patient outcomes. Tissues for TCGA were collected from many sites all over the world in order to reach their accrual targets, usually around 500 specimens per cancer type. For this reason the image data sets are also extremely heterogeneous in terms of scanner modalities, manufacturers and acquisition protocols. In most cases the images were acquired as part of routine care and not as part of a controlled research study or clinical trial.
CIP TCGA Radiology Initiative
Imaging Source Site (ISS) Groups are being populated and governed by participants from institutions that have provided imaging data to the archive for a given cancer type. Modeled after TCGA analysis groups, ISS groups are given the opportunity to publish a marker paper for a given cancer type per the guidelines in the table above. This opportunity will generate increased participation in building these multi-institutional data sets as they become an open community resource. Learn more about the CIP TCGA Radiology Initiative.
b
Data from: An annotated draft genome of the mountain hare (Lepus timidus)
nde-dev.biothings.io
datasetcatalog.nlm.nih.gov
+4more
zip
Updated Dec 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Pedro Marques; Fernando A. Seixas; Jeffrey M. Good; Liliana Farelo; Colin M. Callahan; W. Ian Montgomery; Neil Reid; Paulo C. Alves; Pierre Boursot; José Melo-Ferreira (2020). An annotated draft genome of the mountain hare (Lepus timidus) [Dataset]. http://doi.org/10.5061/dryad.j0zpc86bk
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.j0zpc86bk
Dataset updated
Dec 18, 2020
Dataset provided by
Queen's University Belfast
University of Montana
Universidade do Porto
Institut des Sciences de l'Evolution de Montpellier
Authors
João Pedro Marques; Fernando A. Seixas; Jeffrey M. Good; Liliana Farelo; Colin M. Callahan; W. Ian Montgomery; Neil Reid; Paulo C. Alves; Pierre Boursot; José Melo-Ferreira
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares.

Methods DNA Sampling, Extraction, and Sequencing

One female mountain hare (Lepus timidus hibernicus) specimen (NCBI BioSample ID SAMN12621015) was captured from the wild for scientific research purposes by the Irish Coursing Club (ICC) at Borris-in-Ossory, County Laois under National Parks & Wildlife (NPWS) license no. C 337/2012 issued by the Department of Arts, Heritage and the Gaeltacht (dated October 31, 2012). Genomic DNA was extracted from kidney, muscle, and ear tissue using the JETquick Tissue DNA Spin Kit (GENOMED), with RNAse and proteinase K to remove RNA and protein contamination. Genomic libraries of different insert lengths were generated following the standard ALLPATHS-LG protocol (Gnerre et al. 2011): one Illumina TruSeq DNA library of 180 bp fragments was sequenced with overlapping paired-end (OPE) reads, and three Illumina TruSeq DNA mate-pair (MP) libraries of 2.5, 4.5, and 8.0 kb insert sizes. Whole-genome sequencing was performed at The Genome Analysis Center (TGAC, currently Earlham Institute, Norwich, UK)—seven HiSeq2000 lanes (five OPE and two 4.5 kb MP)—and CIBIO’s New-Gen sequencing platform—three HiSeq1500 lanes (2.5 and 8.0 kb MP). Raw sequencing reads were deposited in the Sequence Read Archive.

Genome Assembly

De novo assembly was performed using ALLPATHS-LG (Gnerre et al. 2011) with default parameters using OPE and mate-pair reads. The resulting assembly was evaluated with REAPR v1.0.18 (Hunt et al. 2013) to break incorrect scaffolds, by mapping the paired-end and the 4.5 kb mate-pair reads on the assembled genome. Another round of scaffolding was then performed using SSPACE v3.0 (Boetzer et al. 2011), with a minimum overlap of 32 bp and supported by a minimum of 20 reads (CIBIO-ISEM_LeTim1.0_Assembled.fasta.gz).

Finally, we leveraged the existence of the high-quality assembly of the genome of the European rabbit (Oryctolagus cuniculus—Ensembl OryCun2.0), to improve the contiguity of the assembly using the reference-based scaffolder MeDuSa v.1.6 with five iterations (Bosi et al. 2015) (CIBIO-ISEM_LeTim1.0_re-scaffolded.fasta.gz).

This re-scaffolding orders and re-orientates scaffolds without affecting intra-scaffold sequence. Quality control of the assembly at different stages was assessed based on metrics obtained with QUAST v.3.2 (Mikheenko et al. 2016). The completeness of the L. timidus re-scaffolded genome was evaluated using BUSCO v.3.0.2 (Simão et al. 2015), based on the presence and absence of core single-copy genes (from mammalia_odb9 database). We then checked consistency of gene content in the larger chromosome-like scaffolds and rabbit chromosomes using blastp from NCBI BLAST v2.7.1+ (Camacho et al. 2009), considering the best hit per gene with similarity above 90% over 500 bp. The 22 rabbit chromosomes were aligned against inferred corresponding L. timidus re-scaffolded scaffolds using D-Genies v. 1.2.0 Mashmap (Cabanettes and Klopp 2018).

Genome Annotation

Repetitive regions were identified using RepeatModeler v.1.0.11 (Smit and Hubley 2008) and masked using RepeatMasker v.4.0.7 (Smit et al. 2013). The masked genome was used as input for gene prediction in MAKER v.3.01.02 (Cantarel et al. 2008), using ab-initio predictions, L. timidus transcriptome data, and rabbit protein annotations (O. cuniculus) - (CIBIO-ISEM_LeTim1.0.cdna.abinitio.fa.gz and CIBIO-ISEM_LeTim1.0.pep.abinitio.fa.gz). Functional inference for genes and transcripts was performed using the translated CDS features of each coding transcript. Each predicted protein sequence was based on blastp searches against the Uniprot-Swissprot database to retrieve gene name and function, and InterProscan v5.30-69 (Jones et al. 2014) to retrieve Interpro, Pfam v31.0 (Finn et al. 2016), GO (Mi et al. 2017), KEGG (Kanehisa et al. 2016), and Reactome (Fabregat et al. 2018) information (annotation files: CIBIO-ISEM_LeTim1.0_re-scaffolded.gff.gz and CIBIO-ISEM_LeTim1.0_re-scaffolded.id.map.gz).
Data from: Aequatus: An open-source homology browser
ckan.earlham.ac.uk
Updated Jun 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.earlham.ac.uk (2019). Aequatus: An open-source homology browser [Dataset]. https://ckan.earlham.ac.uk/dataset/fc5855ca-bfc7-4f70-82e2-53d5f505f1dc
Explore at:
Dataset updated
Jun 2, 2019
Dataset provided by
CKANhttps://ckan.org/
Description
Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterisation enables the identification of syntenic blocks, which can then be visualised with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. We present Aequatus, a standalone web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualisations. It relies on pre-calculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfils the visualisation aspects of Aequatus, available within the Galaxy web platform as a visualisation plugin, which can be used to visualise gene trees generated by the GeneSeqToFamily workflow. Aequatus is an open-source tool freely available to download under the MIT license at https://github.com/TGAC/Aequatus A demo server is available at http://aequatus.earlham.ac.uk/ A publicly available instance of the GeneSeqToFamily workflow to generate gene tree information and visualise it using Aequatus is available on the Galaxy EU server at https://usegalaxy.eu
w
Spend at The Genome Analysis Centre Norwich
data.wu.ac.at
data.europa.eu
csv
Updated Feb 22, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Genome Analysis Centre (2014). Spend at The Genome Analysis Centre Norwich [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/OGMzYmRlMTUtMDNiNy00NDI1LTliY2UtMWQyZWVlYjRlZDA5
Explore at:
csvAvailable download formats
Dataset updated
Feb 22, 2014
Dataset provided by
The Genome Analysis Centre
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A monthly-updated list of all financial transactions spending over £500 made by The Genome Analysis Centree, as part of the Government's commitment to transparency in expenditure.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Charlotte Whicher; Jessica Jordan (2023). Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.4715302.v1

Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.4715302.v1

Dataset updated

Jun 3, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Charlotte Whicher; Jessica Jordan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Final results from the preliminary survey found here: https://figshare.com/articles/TGAC_-_Repositive_Preliminary_Survey_Results/3503873After that preliminary survey we added some additional questions to gain further insights and then opened the survey up to a wider audience. 50 people responded and in the blog post I will discuss our findings from this survey and our final conclusions.

Clear search

Close search

Google apps

Main menu

Tools and methods in genomic data analysis: TGAC - Repositive Main Survey...

The Cancer Genome Atlas Rectum Adenocarcinoma Collection

CIP TCGA Radiology Initiative

Data from: An annotated draft genome of the mountain hare (Lepus timidus)

Data from: Aequatus: An open-source homology browser

Spend at The Genome Analysis Centre Norwich

Tools and methods in genomic data analysis: TGAC - Repositive Main Survey Results