100+ datasets found

Bioinformatics data for paper
catalog.data.gov
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Bioinformatics data for paper [Dataset]. https://catalog.data.gov/dataset/bioinformatics-data-for-paper
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data for sequence comparison of commamox genomes and genes identified. This dataset is associated with the following publication: Camejo, P., J. Santodomingo, K. McMahon, and D. Noguera. Genome-enabled insights into the ecophysiology of the comammox bacterium Ca. Nitrospira nitrosa. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 2(5): 1-16, (2017).
Paper Data
figshare.com
txt
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tong Liu (2023). Paper Data [Dataset]. http://doi.org/10.6084/m9.figshare.21789110.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21789110.v3
Dataset updated
Jun 8, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tong Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Paper Data For "Identification of ACHE as the Hub Gene targeting Solasonine Associated with NSCLC Using Integrated Bioinformatics Analysis"
f
Data_Sheet_1_Resequencing of Microbial Isolates: A Lab Module to Introduce...
frontiersin.figshare.com
pdf
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katherine Lynn Petrie; Rujia Xie (2023). Data_Sheet_1_Resequencing of Microbial Isolates: A Lab Module to Introduce Novices to Command-Line Bioinformatics.pdf [Dataset]. http://doi.org/10.3389/fmicb.2021.578859.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2021.578859.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Katherine Lynn Petrie; Rujia Xie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.
m
paper
data.mendeley.com
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xinyu Zhang (2024). paper [Dataset]. http://doi.org/10.17632/h29hyxwnmc.1
Explore at:
Unique identifier
https://doi.org/10.17632/h29hyxwnmc.1
Dataset updated
Dec 17, 2024
Authors
Xinyu Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
WB raw data
m
Research data for "Subjective data models in bioinformatics: Do wet-lab and...
figshare.manchester.ac.uk
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yochannah Yehudi; Carole Goble; Caroline Jay; Lukas Hughes-Noehrer (2023). Research data for "Subjective data models in bioinformatics: Do wet-lab and computational biologists comprehend data differently?" [Dataset]. http://doi.org/10.48420/20641017.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.48420/20641017.v2
Dataset updated
Jun 1, 2023
Dataset provided by
University of Manchester
Authors
Yochannah Yehudi; Carole Goble; Caroline Jay; Lukas Hughes-Noehrer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Subjective data models dataset

This dataset is comprised of data collected from study participants, for a study into how people working with biological data perceive data, and whether or not this perception of data aligns with a person's experiential and educational background. We call the concept of what data looks like to an individual a "subjective data model".

Todo: link paper/preprint once published.

Computational python analysis code: https://doi.org/10.5281/zenodo.7022789 and https://github.com/yochannah/subjective-data-models-analysis

Files

Transcripts of the recorded sessions are attached and have been verified by a second researcher. These files are all in plain text .txt format. Note that participant 3 did not agree to sharing the transcript of their interview. Interview paper files This folder has digital and photographed versions of the files shown to the participants for the file mapping task. Note that the original files are from the NCBI and from FlyBase. Videos and stills from the recordings have been deleted in line with the Data Management Plan and Ethical Review. anonymous_participant_list.csv shows which files have transcripts associated (not all participants agreed to share transcripts), what the order of Tasks A and B were, the date of interview, and what entities participants added to the set provided (if any). See the paper methods for more info about why entities were added to the set. cards.txt is a full list of the cards presented in the tasks. background survey and background manual annotations are the select survey data about participant background and manual additions to this where necessary, e.g. to interpret free text. codes.csv shows the qualitative codes used within the transcripts. entry_point.csv is a record of participants' identified entry points into the data. file_mapping_responses shows a record of responses to the file mapping task.
r
Computational and Structural Biotechnology Journal Impact Factor 2024-2025 -...
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Computational and Structural Biotechnology Journal Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/290/computational-and-structural-biotechnology-journal
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Computational and Structural Biotechnology Journal Impact Factor 2024-2025 - ResearchHelpDesk - Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology The journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence, and enables the rapid publication of papers under the following categories: Research articles Review articles Mini Reviews Highlights Communications Software/Web server articles Methods articles Database articles Book Reviews Meeting Reviews
m
Trycycler paper dataset
bridges.monash.edu
researchdata.edu.au
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Wick (2023). Trycycler paper dataset [Dataset]. http://doi.org/10.26180/14890734.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26180/14890734.v2
Dataset updated
May 31, 2023
Dataset provided by
Monash University
Authors
Ryan Wick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains the data (references, reads, assemblies) used in the analyses for the Trycycler paper.
m
Data from: Standards Barriers to Bioinformatics Research
bridges.monash.edu
pdf
Updated Nov 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fernando, J. I. E. (2017). Standards Barriers to Bioinformatics Research [Dataset]. http://doi.org/10.4225/03/5a137336427b3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.4225/03/5a137336427b3
Dataset updated
Nov 21, 2017
Dataset provided by
Monash University
Authors
Fernando, J. I. E.
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
Although new and emerging information technologies (IT) can enable the analysis of rapidly expanding bioinformatics data, no standards exists. Standards validate a technology or process against a compilation of consolidated best practice specifications. Standards development represents an effective way to retrieve textual evidence, work collaboratively, and integrate bioinformatics with global e-health initiatives. Thus, standards barriers can impede otherwise productive research efforts. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Scientific Data paper
figshare.com
xlsx
Updated Nov 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Zhang (2021). Scientific Data paper [Dataset]. http://doi.org/10.6084/m9.figshare.16960804.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16960804.v1
Dataset updated
Nov 9, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Yue Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Gene expression matrix of different developmental stages of tissues or individuals in lotus for the Scientific Data paper
m
Polypolish paper dataset
bridges.monash.edu
researchdata.edu.au
bin
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan Wick (2023). Polypolish paper dataset [Dataset]. http://doi.org/10.26180/16727680.v2
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.26180/16727680.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Monash University
Authors
Ryan Wick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This record contains the data (references, reads, assemblies) used in the analyses for the Polypolish paper.
d
Data from: Transcriptomic and bioinformatics analysis of the early...
catalog.data.gov
cloud.csiss.gmu.edu
+1more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Transcriptomic and bioinformatics analysis of the early time-course of the response to prostaglandin F2 alpha in the bovine corpus luteum [Dataset]. https://catalog.data.gov/dataset/data-from-transcriptomic-and-bioinformatics-analysis-of-the-early-time-course-of-the-respo-cd938
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Description
RNA expression analysis was performed on the corpus luteum tissue at five time points after prostaglandin F2 alpha treatment of midcycle cows using an Affymetrix Bovine Gene v1 Array. The normalized linear microarray data was uploaded to the NCBI GEO repository (GSE94069). Subsequent statistical analysis determined differentially expressed transcripts ± 1.5-fold change from saline control with P ≤ 0.05. Gene ontology of differentially expressed transcripts was annotated by DAVID and Panther. Physiological characteristics of the study animals are presented in a figure. Bioinformatic analysis by Ingenuity Pathway Analysis was curated, compiled, and presented in tables. A dataset comparison with similar microarray analyses was performed and bioinformatics analysis by Ingenuity Pathway Analysis, DAVID, Panther, and String of differentially expressed genes from each dataset as well as the differentially expressed genes common to all three datasets were curated, compiled, and presented in tables. Finally, a table comparing four bioinformatics tools' predictions of functions associated with genes common to all three datasets is presented. These data have been further analyzed and interpreted in the companion article "Early transcriptome responses of the bovine mid-cycle corpus luteum to prostaglandin F2 alpha includes cytokine signaling". Resources in this dataset:Resource Title: Supporting information as Excel spreadsheets and tables. File Name: Web Page, url: http://www.sciencedirect.com/science/article/pii/S2352340917304031?via=ihub#s0070
r
Data from: Gene expression analysis for tumor classification using vector...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edna Márquez; Ana María Espinosa; Jaime Berumen; Christian Lemaitre (2022). Gene expression analysis for tumor classification using vector quantization [Dataset]. http://doi.org/10.4225/03/5a137205bd04a
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a137205bd04a
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Edna Márquez; Ana María Espinosa; Jaime Berumen; Christian Lemaitre
Description
Gene expression analysis is one of the most important tasks for genomic medicine, using these it is possible to classify tumors, which are directly related with the development of cancer. This paper presents a clustering method for tumor classification, vector quantization, using gene expression profiles from microarrays of mRNA with samples of cervical cancer and normal cervix. Vector quantization is used to divide the space into regions, and the centroids of the regions represent patients with tumors or healthy ones. Also the regions found by the vector quantizer are used as the base for classifying other tumors, that could help in the prognostics of the illness or for finding new groups of tumors. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
r
Data from: Feature ranking and feature redundancy reduction for prognostic...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qihua Tan; Mads Thomassen; Kaare Christensen; Torben A. Kruse (2022). Feature ranking and feature redundancy reduction for prognostic microarray study of tumor clinical outcomes [Dataset]. http://doi.org/10.4225/03/5a1372383442b
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a1372383442b
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Qihua Tan; Mads Thomassen; Kaare Christensen; Torben A. Kruse
Description
Different from significant gene expression analysis which looks for all genes that are differentially regulated, feature selection in prognostic gene expression analysis aims at finding a subset of informative marker genes that are discriminative for prediction. Unfortunately feature selection in the literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significance. Since the univariate approach does not take into account the correlated or interactive structure among the genes, classifiers built on genes so selected can be less accurate. More advanced approaches based on multivariate models have to be considered. Here, we introduce a feature ranking method through forward orthogonal search to assist prognostic gene selection. Application to published gene-lists selected by univariate models shows that the feature space can be largely reduced while achieving improved testing performances. Our results indicate that "significant" features selected using the gene-wised approaches can contain irrelevant genes that only serve to complicate model building. Multivariate feature ranking can help to reduce feature redundancy and to select highly informative prognostic marker genes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Data from: Data reuse and the open data citation advantage
zenodo.org
search.dataone.org
+2more
bin, csv, txt
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heather A. Piwowar; Todd J. Vision; Heather A. Piwowar; Todd J. Vision (2022). Data from: Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
Explore at:
bin, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.781pv
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Heather A. Piwowar; Todd J. Vision; Heather A. Piwowar; Todd J. Vision
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
m
Data from: Classifying microarray cancer datasets using nearest subspace...
bridges.monash.edu
researchdata.edu.au
pdf
Updated Nov 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cohen, Michael C.; Paliwal, Kuldip K. (2017). Classifying microarray cancer datasets using nearest subspace classification [Dataset]. http://doi.org/10.4225/03/5a13727393276
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.4225/03/5a13727393276
Dataset updated
Nov 21, 2017
Dataset provided by
Monash University
Authors
Cohen, Michael C.; Paliwal, Kuldip K.
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
In this paper we implement and test the recently described nearest subspace classifier on a range of microarray cancer datasets. Its classification accuracy is tested against nearest neighbor and nearest centroid algorithms, and is shown to give a significant improvement. This classification system uses class-dependent PCA to construct a subspace for each class. Test vectors are assigned the class label of the nearest subspace, which is defined as the minimum reconstruction error across all subspaces. Furthermore, we demonstrate this distance measure is equivalent to the null-space component of the vector being analyzed. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
r
Data from: Consensus clustering of gene expression microarray data using...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Mendes (2022). Consensus clustering of gene expression microarray data using genetic algorithms [Dataset]. http://doi.org/10.4225/03/5a13728358b1d
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a13728358b1d
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Alexandre Mendes
Description
This work presents a new consensus clustering method for gene expression microarray data based on a genetic algorithm. Using two datasets - DA and DB - as input, the genetic algorithm examines putative partitions for the samples in DA, selecting biomarkers that support such partitions. The biomarkers are then used to build a classifier which is used in DB to determine its samples classes. The genetic algorithm is guided by an objective function that takes into account the accuracy of classification in both datasets, the number of biomarkers that support the partition, and the distribution of the samples across the classes for each dataset. To illustrate the method, two whole-genome breast cancer instances from dfferent sources were used. In this application, the results indicate that the method could be used to find unknown subtypes of diseases supported by biomarkers presenting similar gene expression profiles across platforms. Moreover, even though this initial study was restricted to two datasets and two classes, the method can be easily extended to consider both more datasets and classes. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
r
Data from: What are the most important features contributing to xylanase...
researchdata.edu.au
bridges.monash.edu
Updated May 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mansour Ebrahimi; Ebrahimi; Esmaeil Ebrahimi; Zahra Zinati; Azar Delavari; M. Mohammadi-dehchesmah (2022). What are the most important features contributing to xylanase thermostability? Applying a feature selection modeling [Dataset]. http://doi.org/10.4225/03/5a137345bcfc5
Explore at:
Unique identifier
https://doi.org/10.4225/03/5a137345bcfc5
Dataset updated
May 5, 2022
Dataset provided by
Monash University
Authors
Mansour Ebrahimi; Ebrahimi; Esmaeil Ebrahimi; Zahra Zinati; Azar Delavari; M. Mohammadi-dehchesmah
Description
Xylan is the main component of hemicellulose which is present in nature in large amounts and can be degraded by either acid or enzymic catalysis with the advantages of a highly efficient conversion rate and non-corrosive and environmentally friendly conditions. Although, the complete breakdown of xylan requires the action of several different enzymes, the depolymerizing endo-1,4,β-xylanase (EC 3.2.1.8) is the key enzyme with possible applications in waste treatment, fuel and chemical production and paper manufacture. In consequence, the importance of finding or making thermostable xylanases has been highlighted. Therefore, it is inevitable to understand the features involving in xylanase thermostability. Here, we looked at more than seventy attributes of 30 xylanase proteins (active in different temperatures) by applying a feature selection algorithm which assigns a p value to each attribute based on the asymptotic distribution of a transformation on the Pearson correlation coefficient, and then, sorts them according to their p values in order to find the most contributing ones regarding the xylanase proteins thermostability. The results showed that the count of oxygen, nitrogen, Glu, Lys, Cys, Phe, Trp, the count of positively and negatively charged residues as well as the count of other residues were the most important features with respect to xylanase thermostability, and 12 more properties were recognized to have a marginal effect on this aspect, while the rest were revealed to be unimportant. The importance of "important" and "marginal" features in xylanase thermostability has been discussed in this paper. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Unique Identifcation of research resources in studies in Reproducibility...
search.datacite.org
figshare.com
Updated Apr 4, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Vasilevsky; David J Kavanagh; Amy Van Deusen; Melissa Haendel; Elizabeth Iorns (2014). Unique Identifcation of research resources in studies in Reproducibility Project: Cancer Biology [Dataset]. http://doi.org/10.6084/m9.figshare.987129
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.987129
Dataset updated
Apr 4, 2014
Dataset provided by
Figsharehttp://figshare.com/
figshare
DataCitehttps://www.datacite.org/
Authors
Nicole Vasilevsky; David J Kavanagh; Amy Van Deusen; Melissa Haendel; Elizabeth Iorns
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reproducibility Project: Cancer Biology (https://osf.io/e81xl/wiki/home/) aims to reproduce the key experiments from 50 landmark papers in cancer research. As a follow up to the previously published study, which showed a lack of indentifiability of research resources in the published biomedical literature (Vasilevsky, et al. 2014, PeerJ 1:e148), we analyzed 6 resource types reported in these papers to determine the identifiability of these resources. The resource types included antibodies, cell lines, constructs, knockdown reagents, model organisms and software. The results showed an average 85% of the resources were identifiable, and the ability to identify the resources varied amongst the resource types.
Data for paper submitted to BMC Bioinformatics
figshare.com
zip
Updated Aug 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Grzegorczyk (2017). Data for paper submitted to BMC Bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.5363545.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5363545.v3
Dataset updated
Aug 31, 2017
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Marco Grzegorczyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data stem fromI. Cantone, L. Marucci, F. Iorio, M.A. Ricci, V. Belcastro, M. Bansal, S. Santini, M. di Bernardo, D. di Bernardo and M.P. Cosma (2009): A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches, Cell, 137, 172-181.Grzegorczyk, M. and Husmeier, D. (2012): A Non-Homogeneous Dynamic Bayesian Network with Sequentially Coupled Interaction Parameters for Applications in Systems and Synthetic Biology, Statistical Applications in Genetics and Molecular Biology (SAGMB), 11(4), Article 7.Please see those publications for details.
[Dataset] Data for the course "Population Genomics" at Aarhus University
zenodo.org
application/gzip, bin
Updated Jan 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch (2025). [Dataset] Data for the course "Population Genomics" at Aarhus University [Dataset]. http://doi.org/10.5281/zenodo.7670839
Explore at:
application/gzip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7670839
Dataset updated
Jan 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Samuele Soraggi; Samuele Soraggi; Kasper Munch; Kasper Munch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets, conda environments and Softwares for the course "Population Genomics" of Prof Kasper Munch. This course material is maintained by the health data science sandbox. This webpage shows the latest version of the course material.

Data.tar.gz Contains the datasets and executable files for some of the softwares
You can unpack by simply doing
tar -zxf Data.tar.gz -C ./
This will create a folder called Data with the uncompressed material inside

Course_Env.packed.tar.gz Contains the conda environment used for the course. This needs to be unpacked to adjust all the prefixes (Note this environment is created on Ubuntu 22.10). You do this in the command line by

creating the folder Course_Env: mkdir Course_Env

untar the file: tar -zxf Course_Env.packed.tar.gz -C Course_Env

Activate the environment: conda activate ./Course_Env

Run the unpacking script (it can take quite some time to get it done): conda-unpack

Course_Env.unpacked.tar.gz The same environment as above, but will work only if untarred into the folder /usr/Material - so use the version above if you are using it in another folder. This file is mostly to execute the course in our own cloud environment.

environment_with_args.yml The file needed to generate the conda environment. Create and activate the environment with the following commands:

conda env create -f environment_with_args.yml -p ./Course_Env

conda activate ./Course_Env

The data is connected to the following repository: https://github.com/hds-sandbox/Popgen_course_aarhus. The original course material from Prof Kasper Munch is at https://github.com/kaspermunch/PopulationGenomicsCourse.

Description

The participants will after the course have detailed knowledge of the methods and applications required to perform a typical population genomic study.

The participants must at the end of the course be able to:

Identify an experimental platform relevant to a population genomic analysis.

Apply commonly used population genomic methods.

Explain the theory behind common population genomic methods.

Reflect on strengths and limitations of population genomic methods.

Interpret and analyze results of population genomic inference.

Formulate population genetics hypotheses based on data

The course introduces key concepts in population genomics from generation of population genetic data sets to the most common population genetic analyses and association studies. The first part of the course focuses on generation of population genetic data sets. The second part introduces the most common population genetic analyses and their theoretical background. Here topics include analysis of demography, population structure, recombination and selection. The last part of the course focus on applications of population genetic data sets for association studies in relation to human health.

Curriculum

The curriculum for each week is listed below. "Coop" refers to a set of lecture notes by Graham Coop that we will use throughout the course.

Course plan

Course intro and overview:

Coop chapters 1, 2, 3, Paper: Genome Diversity Project

Drift and the coalescent:

Coop chapter 4; Paper: Platypus

Exercise: Read mapping and base calling

Recombination:

Lecture: Review: Recombination in eukaryotes, Review: Recombination rate estimation

Exercise: Phasing and recombination rate

Population strucure and incomplete lineage sorting:

Lecture: Coop chapter 6, Review: Incomplete lineage sorting

Exercise: Working with VCF files

Hidden Markov models:

Lecture: Durbin chapter 3, Paper: population structure

Exercise: Inference of population structure and admixture

Ancestral recombination graphs:

Lecture: Paper: Approximating the ARG, Paper: Tree inference

Exercise: ARG dashboard exercises + Inference of trees along sequence

Past population demography:

Lecture: Coop chapter 4, Paper: PSMC, revisit Paper: Tree inference

Exercise: Inferring historical populations

Direct and linked selection:

Lecture: Coop chapters 12, 13, revisit Paper: Tree inference

Admixture:

Lecture: Review: Admixture, Paper: Admixture inference

Exercise: Detecting archaic ancestry in modern humans

Genome-wide association study (GWAS):

Lecture: Coop lecture notes 99-120

Exercise: GWAS quality control

Heritability:

Lecture: Coop Lecture notes Sec. 2.2 (p23-36) + Chap. 7 (p119-142)

Exercise: Association testing

Evolution and disease:

Lecture: Coop Lecture notes Sec. 11.0.1 (p217-221)

Exercise: Estimating heritability

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Bioinformatics data for paper [Dataset]. https://catalog.data.gov/dataset/bioinformatics-data-for-paper

Bioinformatics data for paper

Explore at:

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

Data for sequence comparison of commamox genomes and genes identified. This dataset is associated with the following publication: Camejo, P., J. Santodomingo, K. McMahon, and D. Noguera. Genome-enabled insights into the ecophysiology of the comammox bacterium Ca. Nitrospira nitrosa. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 2(5): 1-16, (2017).

Clear search

Close search

Google apps

Main menu

Bioinformatics data for paper

Paper Data

Data_Sheet_1_Resequencing of Microbial Isolates: A Lab Module to Introduce...

paper

Research data for "Subjective data models in bioinformatics: Do wet-lab and...

Computational and Structural Biotechnology Journal Impact Factor 2024-2025 -...

Trycycler paper dataset

Data from: Standards Barriers to Bioinformatics Research

Scientific Data paper

Polypolish paper dataset

Data from: Transcriptomic and bioinformatics analysis of the early...

Data from: Gene expression analysis for tumor classification using vector...

Data from: Feature ranking and feature redundancy reduction for prognostic...

Data from: Data reuse and the open data citation advantage

Data from: Classifying microarray cancer datasets using nearest subspace...

Data from: Consensus clustering of gene expression microarray data using...

Data from: What are the most important features contributing to xylanase...

Unique Identifcation of research resources in studies in Reproducibility...

Data for paper submitted to BMC Bioinformatics

[Dataset] Data for the course "Population Genomics" at Aarhus University

Bioinformatics data for paper