100+ datasets found

simulated_experiments_1
figshare.com
zip
Updated Jul 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanchi Su (2022). simulated_experiments_1 [Dataset]. http://doi.org/10.6084/m9.figshare.17802935.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17802935.v1
Dataset updated
Jul 13, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yanchi Su
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
simulated experiments 1
simulated_experiments_2
figshare.com
zip
Updated Jul 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanchi Su (2022). simulated_experiments_2 [Dataset]. http://doi.org/10.6084/m9.figshare.17802788.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17802788.v1
Dataset updated
Jul 13, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yanchi Su
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
simulated experiments 2
r
expam Benchmarking - Classifier Performance Statistics
researchdata.edu.au
Updated Jun 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam Benchmarking - Classifier Performance Statistics [Dataset]. http://doi.org/10.26180/19771072.v1
Explore at:
Unique identifier
https://doi.org/10.26180/19771072.v1
Dataset updated
Jun 28, 2022
Dataset provided by
Monash University
Authors
Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel document containing precision, recall and F1 scores for metagenomic classifiers used in the benchmarking of expam's performance. Classifiers were tested on 140 simulated metagenomic communities, at different taxonomic ranks.
[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability
d
Raw motif mapping bedfile data and model training set class probabilities
search.dataone.org
data.niaid.nih.gov
+2more
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillip Davis (2025). Raw motif mapping bedfile data and model training set class probabilities [Dataset]. http://doi.org/10.5061/dryad.tdz08kq3w
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.tdz08kq3w
Dataset updated
May 6, 2025
Dataset provided by
Dryad Digital Repository
Authors
Phillip Davis
Time period covered
Jan 1, 2023
Description
Leveraging prior viral genome sequencing data to make predictions on whether an unknown, emergent virus harbors a â€˜phenotype-of-concernâ€™ has been a long-sought goal of genomic epidemiology. A predictive phenotype model built from nucleotide-level information aloneÂ is challenging with respect to RNA viruses due to the ultra-high intra-sequence variance of their genomes, even within closely related clades. We developed a degenerate k-mer method to accommodate this high intra-sequence variation of RNA virus genomes for modeling frameworks.Â By leveraging a taxonomy-guided â€˜group-shuffle-splitâ€™ cross validation paradigm on complete coronavirus assemblies from prior to October 2018, we trained multiple regularized logistic regression classifiers at the nucleotide k-mer level. We demonstrate the feasibility of this method by finding models accurately predicting withheld SARS-CoV-2 genome sequences as human pathogens and accurately predicting withheld Swine Acute Diarrhea Syndrome coronavirus (...
Sustained software development, not number of citations or journal choice,...
figshare.com
xml
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Gardner; James Paterson; S R McGimpsey; Fatemeh Ashari Ghomi; Aleksandra Pawlik; Alex Gavryushkin; Mik Black (2023). Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software -- PubMed XML files and scripts [Dataset]. http://doi.org/10.6084/m9.figshare.15121818.v2
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.15121818.v2
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Paul Gardner; James Paterson; S R McGimpsey; Fatemeh Ashari Ghomi; Aleksandra Pawlik; Alex Gavryushkin; Mik Black
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PubMed XML files for training and scoring likely benchmark papers.
d
Characterizing the targets of transcription regulators by aggregating...
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morin, Alexander (2023). Characterizing the targets of transcription regulators by aggregating ChIP-seq and perturbation expression data sets [Dataset]. http://doi.org/10.5683/SP3/MAFGFL
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/MAFGFL
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Morin, Alexander
Description
There is a growing collection of genomics data sets generated for identifying the gene targets under control of transcription regulators (TRs). TR ChIP-seq and RNA expression experiments that perturb TR activity are the most common strategies for mapping TRs to genes at a genomic scale. However, the collection, preprocessing, summarization, and integration of these data sets requires a non-trivial degree of bioinformatics experience. In this study, we set out a framework to accomplish these tasks. We focus on eight TRs in both mouse and human, encompassing nearly 500 experiments, with two main objectives. The first is a detailed examination of the properties of the contributing experiments, to better learn of potential biases and pitfalls when aggregating diverse data sets. The second is to provide summarized, transparent, and convenient TR-target rankings based upon these genomic data sets for community use. Our work thus catalogues the state of the literature for a subset of important mammalian TRs, prioritizes gene targets based upon available empirical evidence, and provides a framework for ready expansion to more TR data sets.
r
Data from: Microarray time-series data classification via multiple alignment...
researchdata.edu.au
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ataul Bari; Luis Rueda; Alioune Ngom (2022). Microarray time-series data classification via multiple alignment of gene expression profiles [Dataset]. http://doi.org/10.4225/03/5a1371a04a06e
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a1371a04a06e
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Ataul Bari; Luis Rueda; Alioune Ngom
Description
Pairwise alignment approaches for time-varying gene expression profiles have been recently developed for the detection of co-expressions in time-series microarray data sets. In this paper, we analyze multiple expression profile alignment (MEPA) methods for classifying microarray time-course data. We apply a nearest centroid classification technique, in which the centroid of each class is computed by means of a MEPA algorithm. MEPA aligns the expression profiles in such a way to minimize the total area between all aligned profiles. We propose four MEPA approaches whose effectiveness are demonstrated on the well-known budding yeast, S. cerevisiae, data set. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
r
DRCAT Resource Catalogue
rrid.site
dknet.org
+2more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). DRCAT Resource Catalogue [Dataset]. http://identifiers.org/RRID:SCR_005931
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_005931
Dataset updated
Jul 14, 2025
Description
Data resource catalog that collates metadata on bioinformatics Web-based data resources including databases, ontologies, taxonomies and catalogues. An entry includes information such as resource identifier(s), name, description and URL. ''''Query'''' lines are defined for each resource that describe what type(s) of data are available, in what format, how (by what identifier) the data can be retrieved and from where (URL). DRCAT was developed to provide more extensive data integration for EMBOSS, but it has many applications beyond EMBOSS. DRCAT entries (including ''''Query'''' lines) are annotated with terms from the EDAM ontology of common bioinformatics concepts.
l
The data set for testing cellCounts
opal.latrobe.edu.au
bin
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi (2022). The data set for testing cellCounts [Dataset]. http://doi.org/10.26181/21588276.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26181/21588276.v1
Dataset updated
Dec 19, 2022
Dataset provided by
La Trobe
Authors
Yang Liao; Dinesh Raghu; Bhupinder Pal; Lisa Mielke; Wei Shi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The 10x Chromium single-cell RNA sequencing technology is a powerful gene expression profiling platform, which is capable of profiling expression of thousands of genes in tens of thousands of cells simultaneously. This platform can produce hundreds of million reads in a single experiment, making it a very challenging task to quantify expression levels of genes in individual cells due to the massive data volume. Here we present cellCounts, a new tool for efficient and accurate quanti-fication of 10x Chromium. cellCounts employs the seed-and-vote strategy to align reads to a refer-ence genome, collapses reads to UMIs (Unique Molecular Identifier) and then assigns UMIs to genes based on the featureCounts program. Using multiple real datasets, we showed that cell-Counts is ~3 times faster than cellRanger, a popular quantification program developed by 10x. Using simulation and real datasets with built-in ground truth, we demonstrated that cellCounts is markedly more accurate than cellRanger, cellCounts is implemented in R, making it easily inte-grated with other R programs for analysing Chromium data.
m
Data from: Supplemental data
data.mendeley.com
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudha Acharya (2023). Supplemental data [Dataset]. http://doi.org/10.17632/66r9pkckjz.1
Explore at:
Unique identifier
https://doi.org/10.17632/66r9pkckjz.1
Dataset updated
Aug 22, 2023
Authors
Sudha Acharya
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are 7 supplemental data sets.
Data from: UnFATE: A comprehensive probe set and bioinformatics pipeline for...
data.niaid.nih.gov
zip
Updated Jan 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudio Gennaro Ametrano (2025). UnFATE: A comprehensive probe set and bioinformatics pipeline for phylogeny reconstruction and multilocus barcoding of filamentous ascomycetes (Ascomycota, Pezizomycotina) [Dataset]. http://doi.org/10.5061/dryad.tht76hf1x
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.tht76hf1x
Dataset updated
Jan 23, 2025
Dataset provided by
University of Trieste
Authors
Claudio Gennaro Ametrano
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
The subphylum Pezizomycotina (filamentous ascomycetes) is the largest clade within Ascomycota. Despite the importance of this group of fungi, our understanding of their evolution is still limited due to insufficient taxon sampling. Although next-generation sequencing technology allows us to obtain complete genomes for phylogenetic analyses, generating complete genomes of fungal species can be challenging, especially when fungi occur in symbiotic relationships or when the DNA of rare herbarium specimens is degraded or contaminated. Additionally, assembly, annotation, and gene extraction of whole-genome sequencing data require bioinformatics skills and computational power, resulting in a substantial data burden. To overcome these obstacles, we designed a universal target enrichment probe set to reconstruct the phylogenetic relationships of filamentous ascomycetes at different phylogenetic levels. From a pool of single-copy orthologous genes extracted from available Pezizomycotina genomes, we identified the smallest subset of genetic markers that can reliably reconstruct a robust phylogeny. We used a clustering approach to identify a sequence set that could provide an optimal trade-off between potential missing data and probe set cost. We incorporated this probe set into a user-friendly wrapper script named UnFATE (https://github.com/claudioametrano/UnFATE) that allows phylogenomic inferences without requiring expert bioinformatics knowledge. In addition to phylogenetic results, the software provides a powerful multilocus alternative to ITS-based barcoding. Phylogeny and barcoding approaches can be complemented by an integrated, pre-processed, and periodically updated database of all publicly available Pezizomycotina genomes. The UnFATE pipeline, using the 195 selected marker genes, consistently performed well across various phylogenetic depths, generating trees consistent with the reference phylogenomic inferences. The topological distance between the reference trees from literature and the best tree produced by UnFATE ranged between 0.10 and 0.14 (nRF) for phylogenies from family to subphylum level. We also tested the in vitro success of the universal baits set in a target capture approach on 25 herbarium specimens from ten representative classes in Pezizomycotina, which recovered a topology mostly congruent with recent phylogenomic inferences for this group of fungi. The discriminating power of our gene set was also assessed by the multilocus barcoding approach, which outperformed the barcoding approach based on ITS. With these tools, we aim to provide a framework for a collaborative approach to build robust, conclusive phylogenies of this important fungal clade.
Data from: Benchmarking tools for transcription factor prioritization
zenodo.org
application/gzip
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sebastian Steinhauser; Sebastian Steinhauser; Leonor Schubert Santana; Gaulis Swann; Leonor Schubert Santana; Gaulis Swann (2024). Benchmarking tools for transcription factor prioritization [Dataset]. http://doi.org/10.5281/zenodo.10990183
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10990183
Dataset updated
Apr 23, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sebastian Steinhauser; Sebastian Steinhauser; Leonor Schubert Santana; Gaulis Swann; Leonor Schubert Santana; Gaulis Swann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 19, 2024
Description
Abstract:

Spatiotemporal regulation of gene expression is controlled by transcription factor (TF) binding to regulatory elements, resulting in a plethora of cell types and cell states from the same genetic information. Due to the importance of regulatory elements, various sequencing methods have been developed to localise them in genomes, for example using ChIP-seq profiling of the histone mark H3K27ac that marks active regulatory regions. Moreover, multiple tools have been developed to predict TF binding to these regulatory elements based on DNA sequence. As altered gene expression is a hallmark of disease phenotypes, identifying TFs driving such gene expression programs is critical for the identification of novel drug targets.In this study, we curated 84 chromatin profiling experiments (H3K27ac ChIP-seq) where TFs were perturbed through e.g., genetic knockout or overexpression. We ran nine published tools to prioritize TFs using these real-world data sets and evaluated the performance of the methods in identifying the perturbed TFs. This allowed the nomination of three frontrunner tools, namely RcisTarget, MEIRLOP and monaLisa. Our analyses revealed opportunities and commonalities of tools that will help to guide further improvements and developments in the field.

Dataset description:

tf_tool_benchmark_atacseq_diffPeaks.tar.gz -Archive containing differential peak statistics, tool diff peak input files (fore- and background) for all currated ATAC-seq datasets.

tf_tool_benchmark_h3K27ac_chipseq_diffPeaks.tar.gz - Archive containing differential peak statistics, tool diff peak input files (fore- and background) for all currated H3K27ac ChIP-seq datasets.

tf_tool_benchmark_atacseq_results.tar.gz - Archive containing the raw tool results for each ATAC-seq dataset.

tf_tool_benchmark_chipseq_results.tar.gz - Archive containing the raw tool results for each H3K27ac ChIP-seq dataset.

tf_tool_benchmark_results.tar.gz - Archive containing tool results summary for plotting (rds files).

Contact: Sebastian Steinhauser - sebastian.steinhauser@novartis.com
d
Data from: The new bioinformatics: integrating ecological data from the gene...
datadryad.org
data.niaid.nih.gov
+1more
zip
Updated Jul 16, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman (2012). The new bioinformatics: integrating ecological data from the gene to the biosphere [Dataset]. http://doi.org/10.5061/dryad.qb0d6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qb0d6
Dataset updated
Jul 16, 2012
Dataset provided by
Dryad
Authors
Matthew B. Jones; Mark P. Schildahuer; O. J. Reichman; Shawn Bowers; Mark P. Schildhauer; O.J. Reichman
Time period covered
2012
Description
Cumulative number of data packages in the Knowledge Network for Biocomplexity until 2007-06-21This data set records the cumulative number of data packages in the Knowledge Network for Biocomplexity (KNB) data repository through 2007-06-21. A data package represents a set of data files and metadata files that together make a coherent, citable unit for some particular scientific activity. Each data package in the KNB is described by a scientific metadata document and can be composed of one or more data files that contain various segments of the data in question.cumdatasets-20070622.csv
m
Inter-residue distances surrounding the ligand data sets using MANORAA
data.mendeley.com
narcis.nl
Updated Sep 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duangrudee Tanramluk (2021). Inter-residue distances surrounding the ligand data sets using MANORAA [Dataset]. http://doi.org/10.17632/4z4mypck9b.3
Explore at:
Unique identifier
https://doi.org/10.17632/4z4mypck9b.3
Dataset updated
Sep 22, 2021
Authors
Duangrudee Tanramluk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distances measured between distinctive parts of amino acid residues surrounding the ligand.
Simulation files and results without missing data
search.datacite.org
figshare.com
Updated Jan 19, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
April Wright (2016). Simulation files and results without missing data [Dataset]. http://doi.org/10.6084/m9.figshare.1160601.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.1160601.v1
Dataset updated
Jan 19, 2016
Dataset provided by
DataCitehttps://www.datacite.org/
figshare
Figsharehttp://figshare.com/
Authors
April Wright
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Publication: Wright AM and Hillis DM (2014). Bayesian analysis using a simple likelihood model outperforms parsimony for estimation of phylogeny from discrete morphological data. PLOS ONE. Contents: Data sets without missing data, and the phylogenetic trees estimated from these sets. Details: These data sets were simulated along the tree in Fig. 1 of the paper. No missing data distribution was imposed on these data sets.
Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...
zenodo.org
pdf, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7875495
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Please also see the latest version of the repository:
https://doi.org/10.5281/zenodo.6374011 and
our website: https://ilandavis.com/jcb2023-yfp

The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
r
Minimal siRNA set cover heuristic for gene family knockdown
researchdata.edu.au
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoguang Li; Alioune Ngom; Luis Rueda (2022). Minimal siRNA set cover heuristic for gene family knockdown [Dataset]. http://doi.org/10.4225/03/5a1371b7e405f
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a1371b7e405f
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Xiaoguang Li; Alioune Ngom; Luis Rueda
Description
PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
d
Whole genome DNA sequences of Gulf of Mexico invertebrates
search.dataone.org
data.griidc.org
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas, W. Kelley (2025). Whole genome DNA sequences of Gulf of Mexico invertebrates [Dataset]. http://doi.org/10.7266/n7-pchj-dh15
Explore at:
Unique identifier
https://doi.org/10.7266/n7-pchj-dh15
Dataset updated
Feb 5, 2025
Dataset provided by
GRIIDC
Authors
Thomas, W. Kelley
Area covered
Gulf of Mexico (Gulf of America)
Description
The dataset consists of whole genome DNA sequences, generated from invertebrate species from the Gulf of Mexico during the Benthic Invertebrate Taxonomy, Metagenomics, and Bioinformatics Workshop (BITMaB) in 2017 in Corpus Christi, Texas, USA. All genomic data sets were deposited in and distributed by GenBank (NCBI), the European Nucleotide Archive (ENA)- European Bioinformatics Institute (EMBL-EBI), DNA Data Bank of Japan, NemATOL, the Global Genome Initiative, and Ocean Genome Legacy.
s
ATGC: Montpellier bioinformatics platform
scicrunch.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ATGC: Montpellier bioinformatics platform [Dataset]. http://identifiers.org/RRID:SCR_002917
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_002917
Area covered
Montpellier
Description
A bioinformatics platform that is a joint project of several South of France laboratories with available services based on their expertise, issued from their research activities which involve phylogenetics, population genetics, molecular evolution, genome dynamics, comparative and functional genomics, and transcriptome analysis. Most of the software and databases on ATGC are (co)authored by researchers from South of France teams. Some are widely used and highly cited. South of France laboratories: * CRBM (transcriptomes and stem cells). * IBC (computational biology). * MiVEGEC (evolution and phylogeny). * LGDP (plant genomics). * LIRMM (computer science). * South Green (plant genomics).