Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.
Facebook
Twitterhttps://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/
A dataset that explores Green Card sponsorship trends, salary data, and employer insights for biostatistics, bioinformatics, and systems biology in the U.S.
Facebook
TwitterThe dataset was collected through whole-transcriptome RNA-Sequencing technologies. The processing method was described in the manuscript.
Facebook
TwitterMultidimensional scaling (MDS) is a dimensionality reduction technique for microbial ecology data analysis that represents the multivariate structure while preserving pairwise distances between samples. While its improvements have enhanced the ability to reveal data patterns by sample groups, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination, “F-informed MDS,†which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using simulated compositional datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and ..., , # Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference
monospaced.Â
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Cell refractive index (RI) was proposed as a putative cancer biomarker of great potential, being correlated with cell content and morphology, cell division rate and membrane permeability. We used Digital Holographic Microscopy (DHM) to compare RI and dry mass density of two B16 murine melanoma sublines of different metastatic potential. Using statistical methods, the phase shifts distribution within the reconstructed quantitative phase images (QPIs) was analyzed by the method of bimodality coefficients. The observed correlation of RI and bimodality profile with the cells metastatic potential was validated by real time impedance based-assay and clonogenic tests. We suggest RI and QPIs histograms bimodality analysis to be developed as optical biomarkers useful in label-free detection and quantitative evaluation of cell metastatic potential.
Facebook
TwitterBuilding prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Portuguese National Registry on low weight newborns between 2013 and 2018, made available for research purposes. Dataset is composed of 3823 unique entries registering birthweight, biological sex of the infant (1-Male; 2-Female), CRIB score (0-21) and survival (0-Survival; 1-Death).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel document containing precision, recall and F1 scores for metagenomic classifiers used in the benchmarking of expam's performance. Classifiers were tested on 140 simulated metagenomic communities, at different taxonomic ranks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of 1763 observations, each representing a unique patient, and 12 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.
Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.
Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.
Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health
Facebook
Twitterhttps://www.bco-dmo.org/dataset/813173/licensehttps://www.bco-dmo.org/dataset/813173/license
Supplementary Table 4C: Metatranscriptome data summary for cellular activities presented and statistics on sequencing and removal of potential contaminant sequences: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. Samples taken on board of the R/V JOIDES Resolution between November 30, 2015 and January 30, 2016 access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=Rock material was crushed while still frozen in a Progressive Exploration Jaw Crusher (Model 150) whose surfaces were sterilized with 70% ethanol and RNase AWAY (Thermo Fisher Scientific, USA) inside a laminar flow hood. Powdered rock material was returned to the -80\u00b0C freezer until extraction.
DNA was extracted from 20, 30, or 40 grams of powdered rock material, depending on the quantity of rock available. A DNeasy PowerMax Soil Kit (Qiagen, USA) was used following the manufacturer\u2019s protocol modified to included three freeze/thaw treatments prior to the addition of Soil Kit solution C1. Each treatment consisted of 1 minute in liquid nitrogen followed by 5 minutes at 65 \u00b0C. DNA extracts were concentrated by isopropanol precipitation overnight at 4\u00b0C.
The low biomass in our samples required whole genome amplification (WGA) prior to PCR amplification of marker genes. Genomic DNA was amplified by Multiple Displacement Amplification (MDA) using the REPLI-g Single Cell Kit (Qiagen) as directed. MDA bias was minimized by splitting each WGA sample into triplicate 16 \u03bcL reactions after 1 hr of amplification and then resuming amplification for the manufacturer-specified 7 hrs (8 hrs total).
DNA was also recovered from samples of drilling mud and drilling fluid (surface water collected during the coring process) for negative controls, as well as two \u201ckit control\u201d samples, in which no sample was added, to account for any contaminants originating from either the DNeasy PowerMax Soil Kit or the REPLI-g Single Cell Kit.
Bacterial SSU rRNA gene fragments were PCR amplified from MDA samples and sequenced at Georgia Genomics and Bioinformatics Core (Univ. of Georgia). The primers used were: Bac515-Y and Bac926R. Dual-indexed libraries were prepared with (HT) iTruS (Kappa Biosystems) chemistry and sequencing was performed on an Illumina MiSeq 2 x 300 bp system with all samples combined equally on a single flow cell.
Raw sequence reads were processed through Trim Galore [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/], FLASH (ccb.jhu.edu/software/FLASH/) and FASTX Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/] for trimming and removal of low quality/short reads.
Quality filtering included requiring a minimum average quality of 25 and rejection of paired reads less than 250 nucleotides.
Operational Taxonomic Unit (OTU) clusters were constructed at 99% similarity
with the script pick_otus.py within the Quantitative Insights Into Microbial
Ecology (QIIME) v.1.9.1 software and \u2018uclust\u2019. Any OTU that matched
an OTU in one of our control samples (drilling fluids, drilling mud,
extraction and WGA controls) was removed (using filter_otus_from_otu_table.py)
along with any sequences of land plants and human pathogens that may have
survived the control filtering due to clustering at 99%
(filter_taxa_from_otu_table.py). As an additional quality control measure,
genera that are commonly identified as PCR contaminants were removed.
Unclassified OTUs were queried using BLAST against the GenBank nr database and
further information about these OTUs is provided in the Supplementary
Discussion text under the section \u201cTaxonomic diversity information from
iTAGs.\u201d OTUs that could not be assigned to Bacteria or Archaea were
removed from further analysis. For downstream analyses, any OTUs not
representing more than 0.01% of relative abundance of sequences overall were
removed as those are unlikely to contribute significantly to in situ
communities. The OTU data table was transformed to a presence/absence table
and the Jaccard method was used to generate a distance matrix using the
dist.binary() function in the R package ade4.
awards_0_award_nid=709555
awards_0_award_number=OCE-1658031
awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1658031
awards_0_funder_name=NSF Division of Ocean Sciences
awards_0_funding_acronym=NSF OCE
awards_0_funding_source_nid=355
awards_0_program_manager=David L. Garrison
awards_0_program_manager_nid=50534
cdm_data_type=Other
comment=Supplementary Table 4C: iTAG
PI: Virginia Edgcomb
Data Version 1: 2020-05-28
Conventions=COARDS, CF-1.6, ACDD-1.3
data_source=extract_data_as_tsv version 2.3 19 Dec 2019
dataset_current_state=Final and no updates
defaultDataQuery=&time<now
doi=10.26008/1912/bco-dmo.813173.1
Easternmost_Easting=57.278183
geospatial_lat_max=-32.70567
geospatial_lat_min=-32.70567
geospatial_lat_units=degrees_north
geospatial_lon_max=57.278183
geospatial_lon_min=57.278183
geospatial_lon_units=degrees_east
geospatial_vertical_max=747.7
geospatial_vertical_min=10.7
geospatial_vertical_positive=down
geospatial_vertical_units=m
infoUrl=https://www.bco-dmo.org/dataset/813173
institution=BCO-DMO
instruments_0_acronym=Automated Sequencer
instruments_0_dataset_instrument_description=DNA sequencing performed using the Illumina MiSeq 2 x 300 bp platform (Univ. of Georgia)
instruments_0_dataset_instrument_nid=813183
instruments_0_description=General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step.
instruments_0_instrument_name=Automated DNA Sequencer
instruments_0_instrument_nid=649
instruments_0_supplied_name=Illumina MiSeq 2 x 300 bp platform
metadata_source=https://www.bco-dmo.org/api/dataset/813173
Northernmost_Northing=-32.70567
param_mapping={'813173': {'Latitude': 'flag - latitude', 'Depth': 'flag - depth', 'Longitude': 'flag - longitude'}}
parameter_source=https://www.bco-dmo.org/mapserver/dataset/813173/parameters
people_0_affiliation=Woods Hole Oceanographic Institution
people_0_affiliation_acronym=WHOI
people_0_person_name=Virginia P. Edgcomb
people_0_person_nid=51284
people_0_role=Principal Investigator
people_0_role_type=originator
people_1_affiliation=Woods Hole Oceanographic Institution
people_1_affiliation_acronym=WHOI
people_1_person_name=Virginia P. Edgcomb
people_1_person_nid=51284
people_1_role=Contact
people_1_role_type=related
people_2_affiliation=Woods Hole Oceanographic Institution
people_2_affiliation_acronym=WHOI BCO-DMO
people_2_person_name=Karen Soenen
people_2_person_nid=748773
people_2_role=BCO-DMO Data Manager
people_2_role_type=related
project=Subseafloor Lower Crust Microbiology
projects_0_acronym=Subseafloor Lower Crust Microbiology
projects_0_description=NSF abstract:
The lower ocean crust has remained largely unexplored and represents one of the last frontiers for biological exploration on Earth. Preliminary data indicate an active subsurface biosphere in samples of the lower oceanic crust collected from Atlantis Bank in the SW Indian Ocean as deep as 790 m below the seafloor. Even if life exists in only a fraction of the habitable volume where temperatures permit and fluid flow can deliver carbon and energy sources, an active lower oceanic crust biosphere would have implications for deep carbon budgets and yield insights into microbiota that may have existed on early Earth. This is all of great interest to other research disciplines, educators, and students alike. A K-12 education program will capitalize on groundwork laid by outreach collaborator, A. Martinez, a 7th grade teacher in Eagle Pass, TX, who sailed as outreach expert on Drilling Expedition 360. Martinez works at a Title 1 school with ~98% Hispanic and ~2% Native American students and a high number of English Language Learners and migrants. Annual school visits occur during which the project investigators present hands on-activities introducing students to microbiology, and talks on marine microbiology, the project, and how to pursue science related careers. In addition, monthly Skype meetings with students and PIs update them on project progress. Students travel to the University of Texas Marine Science Institute annually, where they get a campus tour and a 3-hour cruise on the R/V Katy, during which they learn about and help with different oceanographic sampling approaches. The project partially supports two graduate students, a Woods Hole undergraduate summer student, the participation of multiple Texas A+M undergraduate students, and 3 principal investigators at two institutions, including one early career researcher who has not previously received NSF support of his own.
Given the dearth of knowledge of the lower oceanic crust, this project is poised to transform our understanding of life in this vast environment. The project assesses metabolic functions within all three domains of life in this crustal biosphere, with a focus on nutrient cycling and evaluation of connections to other deep marine microbial habitats. The lower ocean crust represents a potentially vast biosphere whose microbial constituents and the biogeochemical cycles they mediate are likely linked to deep ocean processes through faulting and subsurface fluid flow. Atlantis Bank represents a tectonic
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive is a database generated using the novel Virus Pop pipeline, which simulates realistic protein sequences and adds new branches to a protein phylogenetic tree. An article describing the pipeline is currently under review.
The database contains simulations of 995 different proteins from 93 virus genera, providing a total of 24,138,277 sequences, both in amino acid and nucleotide.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract:
Spatiotemporal regulation of gene expression is controlled by transcription factor (TF) binding to regulatory elements, resulting in a plethora of cell types and cell states from the same genetic information. Due to the importance of regulatory elements, various sequencing methods have been developed to localise them in genomes, for example using ChIP-seq profiling of the histone mark H3K27ac that marks active regulatory regions. Moreover, multiple tools have been developed to predict TF binding to these regulatory elements based on DNA sequence. As altered gene expression is a hallmark of disease phenotypes, identifying TFs driving such gene expression programs is critical for the identification of novel drug targets.In this study, we curated 84 chromatin profiling experiments (H3K27ac ChIP-seq) where TFs were perturbed through e.g., genetic knockout or overexpression. We ran nine published tools to prioritize TFs using these real-world data sets and evaluated the performance of the methods in identifying the perturbed TFs. This allowed the nomination of three frontrunner tools, namely RcisTarget, MEIRLOP and monaLisa. Our analyses revealed opportunities and commonalities of tools that will help to guide further improvements and developments in the field.
Dataset description:
Contact: Sebastian Steinhauser - sebastian.steinhauser@novartis.com
Facebook
Twitter(A) Bioinformatics Summary statistics and (B) Sequence identity matrix between strains. (XLSX)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw datafile of the survey data collected from the survey distributed to collect knowledge and attitudes among life scientists towards reproducibility within journal articles.
Facebook
TwitterReference Viral Databases (RVDB-prot and RVDB-prot-HMM) were developed by Thomas Bigot in Marc Eloit’s Pathogen Discovery group in collaboration with Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI) at Institut Pasteur, for enhancing virus detection using next-generation sequencing (NGS) technologies. They are based on the reference Viral DataBase, courtesy of Arifa Khan’s group at CBER, FDA:https://hive.biochemistry.gwu.edu/rvdb/.They are updated after each new release of the nucleotidic database. The version number of the protein databases follows the one of the original nucleic database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the manuscript:
DNMT3A-R882 mutation intrinsically mimics maladaptive myelopoiesis from human haematopoietic stem cells
Giovanna Mantica1*, Aditi Vedi1,2*, Amos Tuval3§, Hector Huerga-Encabo4§, Daniel Hayler1§, Aleksandra Krzywon1,5, Emily Mitchell6, William Dunn1, Tamir Biezuner3, Kendig Sham1, Antonella Santoro1, Joe Lee6, Adi Danin3, Noa Chapal3, Yoni Moskovitz3,7, Andrea Arruda8, Edoardo Fiorillo9, Valeria Orru9, Michele Marongiu9, Eoin McKinney10, Francesco Cucca9,11, Matthew Collin12, Mark Minden8, Peter Campbell6, George S Vassiliou1, Margarete Fabre1, Jyoti Nangalia1,6, Dominique Bonnet4, Liran Shlush3,7,8, Elisa Laurenti1
Affiliations: 1 Department of Haematology and Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK. 2 Department of Paediatric Oncology, Cambridge University Hospitals NHS Foundation Trust 3 Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel. 4 Haematopoietic Stem Cell Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK 5 Department of Biostatistics and Bioinformatics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice Branch, Gliwice, Poland 6 Wellcome Sanger Institute, Hinxton, CB10 1SA, UK 7 Division of Haematology Rambam Healthcare Campus, Haifa 31096, Israel. 8 Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada. 9 Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Lanusei, Italy. 10 Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK 11 Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy 12 Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Pneumococcus serotype co-colonization, caused by the polymorphic bacteria Streptococcus pneumoniae, has been increasingly investigated and reported in recent years. Yet, there is limited information on how co-colonization patterns vary globally, critical for understanding the evolution and transmission dynamics of these bacteria. Here we report on a rich dataset of cross-sectional pneumococcal colonization studies collected from the literature, where we quantified patterns of transmission intensity and co-colonization variation in children populations across different epidemiological settings. Fitting these data to an SIS model with co-colonization under the assumption of quasi-neutrality among multiple interacting strains, our analysis reveals strong patterns of negative co-variation between transmission intensity R0 and susceptibility to co-colonization k, in support of the stress-gradient-hypothesis (SGH) in ecology. According to this hypothesis, ecological interactions between organisms shift positively as environmental stress increases. In our model higher environmental stress is represented via lower values of the basic reproduction number R0, and a shift towards positive interactions is represented via higher vulnerability to co-colonization (higher k) between pneumococcus serotypes.
Facebook
Twitterhttp://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
In this paper, we aim at using genetic algorithms for gene selection and propose silhouette statistics as a discriminant function to classify breast cancers on microarray data for pattern discovery. In order to see the causality among these genes, we use the Bayesian method to construct a probability network for the pattern discovered. Consequently, we found a set of genes that is effective to discriminate breast cancer subtypes and present their probability dependencies to construct a diagnostic system. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1
Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Methylation and Hydroxymethylation data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.