61 datasets found
  1. f

    Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail G. Dozmorov (2023). Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF [Dataset]. http://doi.org/10.3389/fbioe.2018.00198.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Mikhail G. Dozmorov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

  2. m

    2025 Green Card Report for Biostatistics, Bioinformatics, and Systems...

    • myvisajobs.com
    Updated Jan 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MyVisaJobs (2025). 2025 Green Card Report for Biostatistics, Bioinformatics, and Systems Biology [Dataset]. https://www.myvisajobs.com/reports/green-card/major/biostatistics,-bioinformatics,-and-systems-biology
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    MyVisaJobs
    License

    https://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/

    Variables measured
    Major, Salary, Petitions Filed
    Description

    A dataset that explores Green Card sponsorship trends, salary data, and employer insights for biostatistics, bioinformatics, and systems biology in the U.S.

  3. d

    Two-step mixed model approach to analyzing differential alternative RNA...

    • datadryad.org
    zip
    Updated Sep 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Luo; Huining Kang; Xichen Li; Scott Ness; Christine Stidley (2020). Two-step mixed model approach to analyzing differential alternative RNA splicing: Datasets and R scripts for analysis of alternative splicing [Dataset]. http://doi.org/10.5061/dryad.66t1g1k0h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2020
    Dataset provided by
    Dryad
    Authors
    Li Luo; Huining Kang; Xichen Li; Scott Ness; Christine Stidley
    Time period covered
    Sep 26, 2020
    Description

    The dataset was collected through whole-transcriptome RNA-Sequencing technologies. The processing method was described in the manuscript.

  4. d

    Multidimensional scaling informed by F-statistic: Visualizing microbiome for...

    • dataone.org
    • search.dataone.org
    • +1more
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie (2025). Multidimensional scaling informed by F-statistic: Visualizing microbiome for inference [Dataset]. http://doi.org/10.5061/dryad.vmcvdnd3x
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie
    Description

    Multidimensional scaling (MDS) is a dimensionality reduction technique for microbial ecology data analysis that represents the multivariate structure while preserving pairwise distances between samples. While its improvements have enhanced the ability to reveal data patterns by sample groups, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination, “F-informed MDS,†which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using simulated compositional datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and ..., , # Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

    File: Data.zip

    Description:Â Raw data used in this study. Includes 3 folders and 1 file (see below).
    1. Folder Simulated contains pairwise distances and ordination results from three simulated datasets. Includes 7 subfolders and 6 files.
      • Six files are the original dataset and its associated labels set. The names are formatted as "sim_<*x*>-<*type*>.*csv*" where <*x*> is the replicate number and <*type*> indicates whether the file is the design matrix ("data") or response vector ("Y").
      • Seven subfolders are grouped by the ordination method. Likewise, the file ...,
  5. Dataset for: Evaluation of metastatic potential of malignant cells by image...

    • wiley.figshare.com
    application/x-rar
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Violeta Liuba Calin; Mona Mihailescu; Eugen I Scarlat; Alexandra Valentina Baluta; Daniel Calin; Eugenia Kovacs; Tudor Savopol; Mihaela Georgeta Moisescu (2023). Dataset for: Evaluation of metastatic potential of malignant cells by image processing of digital holographic microscopy data [Dataset]. http://doi.org/10.6084/m9.figshare.5311108.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Violeta Liuba Calin; Mona Mihailescu; Eugen I Scarlat; Alexandra Valentina Baluta; Daniel Calin; Eugenia Kovacs; Tudor Savopol; Mihaela Georgeta Moisescu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Cell refractive index (RI) was proposed as a putative cancer biomarker of great potential, being correlated with cell content and morphology, cell division rate and membrane permeability. We used Digital Holographic Microscopy (DHM) to compare RI and dry mass density of two B16 murine melanoma sublines of different metastatic potential. Using statistical methods, the phase shifts distribution within the reconstructed quantitative phase images (QPIs) was analyzed by the method of bimodality coefficients. The observed correlation of RI and bimodality profile with the cells metastatic potential was validated by real time impedance based-assay and clonogenic tests. We suggest RI and QPIs histograms bimodality analysis to be developed as optical biomarkers useful in label-free detection and quantitative evaluation of cell metastatic potential.

  6. f

    Data from: Improving stability of prediction models based on correlated...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 20, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar; Tissier, Renaud (2018). Improving stability of prediction models based on correlated omics data by using network approaches [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000673745
    Explore at:
    Dataset updated
    Feb 20, 2018
    Authors
    Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar; Tissier, Renaud
    Description

    Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.

  7. m

    NeonatalPortugal2018

    • data.mendeley.com
    Updated Dec 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Machado e Costa (2019). NeonatalPortugal2018 [Dataset]. http://doi.org/10.17632/br8tnh3h47.1
    Explore at:
    Dataset updated
    Dec 7, 2019
    Authors
    Francisco Machado e Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Portuguese National Registry on low weight newborns between 2013 and 2018, made available for research purposes. Dataset is composed of 3823 unique entries registering birthweight, biological sex of the infant (1-Male; 2-Female), CRIB score (0-21) and survival (0-Survival; 1-Death).

  8. r

    expam Benchmarking - Classifier Performance Statistics

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam Benchmarking - Classifier Performance Statistics [Dataset]. http://doi.org/10.26180/19771072.v1
    Explore at:
    Dataset updated
    Jun 28, 2022
    Dataset provided by
    Monash University
    Authors
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel document containing precision, recall and F1 scores for metagenomic classifiers used in the benchmarking of expam's performance. Classifiers were tested on 140 simulated metagenomic communities, at different taxonomic ranks.

  9. m

    Prediction of Heart Attack

    • data.mendeley.com
    Updated Aug 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakin Sad Aftab (2024). Prediction of Heart Attack [Dataset]. http://doi.org/10.17632/yrwd336rkz.2
    Explore at:
    Dataset updated
    Aug 21, 2024
    Authors
    Rakin Sad Aftab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 1763 observations, each representing a unique patient, and 12 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.

    Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.

    Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.

    Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health

  10. E

    [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C:...

    • erddap.bco-dmo.org
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2020). [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. (Collaborative Research: Delineating The Microbial Diversity and Cross-domain Interactions in The Uncharted Subseafloor Lower Crust Using Meta-omics and Culturing Approaches) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_813173/index.html
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/813173/licensehttps://www.bco-dmo.org/dataset/813173/license

    Area covered
    Variables measured
    depth, iTAG_OTU, iTAG_Raw, latitude, Sample_ID, longitude, Metatr_Raw, iTAG_Paired_QC, Metatr_Paired_QC, Metatr_Reads_Remaining, and 2 more
    Description

    Supplementary Table 4C: Metatranscriptome data summary for cellular activities presented and statistics on sequencing and removal of potential contaminant sequences: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. Samples taken on board of the R/V JOIDES Resolution between November 30, 2015 and January 30, 2016 access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=Rock material was crushed while still frozen in a Progressive Exploration Jaw Crusher (Model 150) whose surfaces were sterilized with 70% ethanol and RNase AWAY (Thermo Fisher Scientific, USA) inside a laminar flow hood. Powdered rock material was returned to the -80\u00b0C freezer until extraction.

    DNA was extracted from 20, 30, or 40 grams of powdered rock material, depending on the quantity of rock available. A DNeasy PowerMax Soil Kit (Qiagen, USA) was used following the manufacturer\u2019s protocol modified to included three freeze/thaw treatments prior to the addition of Soil Kit solution C1. Each treatment consisted of 1 minute in liquid nitrogen followed by 5 minutes at 65 \u00b0C. DNA extracts were concentrated by isopropanol precipitation overnight at 4\u00b0C.

    The low biomass in our samples required whole genome amplification (WGA) prior to PCR amplification of marker genes. Genomic DNA was amplified by Multiple Displacement Amplification (MDA) using the REPLI-g Single Cell Kit (Qiagen) as directed. MDA bias was minimized by splitting each WGA sample into triplicate 16 \u03bcL reactions after 1 hr of amplification and then resuming amplification for the manufacturer-specified 7 hrs (8 hrs total).

    DNA was also recovered from samples of drilling mud and drilling fluid (surface water collected during the coring process) for negative controls, as well as two \u201ckit control\u201d samples, in which no sample was added, to account for any contaminants originating from either the DNeasy PowerMax Soil Kit or the REPLI-g Single Cell Kit.

    Bacterial SSU rRNA gene fragments were PCR amplified from MDA samples and sequenced at Georgia Genomics and Bioinformatics Core (Univ. of Georgia). The primers used were: Bac515-Y and Bac926R. Dual-indexed libraries were prepared with (HT) iTruS (Kappa Biosystems) chemistry and sequencing was performed on an Illumina MiSeq 2 x 300 bp system with all samples combined equally on a single flow cell.

    Raw sequence reads were processed through Trim Galore [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/], FLASH (ccb.jhu.edu/software/FLASH/) and FASTX Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/] for trimming and removal of low quality/short reads.

    Quality filtering included requiring a minimum average quality of 25 and rejection of paired reads less than 250 nucleotides.

    Operational Taxonomic Unit (OTU) clusters were constructed at 99% similarity with the script pick_otus.py within the Quantitative Insights Into Microbial Ecology (QIIME) v.1.9.1 software and \u2018uclust\u2019. Any OTU that matched an OTU in one of our control samples (drilling fluids, drilling mud, extraction and WGA controls) was removed (using filter_otus_from_otu_table.py) along with any sequences of land plants and human pathogens that may have survived the control filtering due to clustering at 99% (filter_taxa_from_otu_table.py). As an additional quality control measure, genera that are commonly identified as PCR contaminants were removed. Unclassified OTUs were queried using BLAST against the GenBank nr database and further information about these OTUs is provided in the Supplementary Discussion text under the section \u201cTaxonomic diversity information from iTAGs.\u201d OTUs that could not be assigned to Bacteria or Archaea were removed from further analysis. For downstream analyses, any OTUs not representing more than 0.01% of relative abundance of sequences overall were removed as those are unlikely to contribute significantly to in situ communities. The OTU data table was transformed to a presence/absence table and the Jaccard method was used to generate a distance matrix using the dist.binary() function in the R package ade4. awards_0_award_nid=709555 awards_0_award_number=OCE-1658031 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1658031 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=Supplementary Table 4C: iTAG PI: Virginia Edgcomb
    Data Version 1: 2020-05-28 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 dataset_current_state=Final and no updates defaultDataQuery=&time<now doi=10.26008/1912/bco-dmo.813173.1 Easternmost_Easting=57.278183 geospatial_lat_max=-32.70567 geospatial_lat_min=-32.70567 geospatial_lat_units=degrees_north geospatial_lon_max=57.278183 geospatial_lon_min=57.278183 geospatial_lon_units=degrees_east geospatial_vertical_max=747.7 geospatial_vertical_min=10.7 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/813173 institution=BCO-DMO instruments_0_acronym=Automated Sequencer instruments_0_dataset_instrument_description=DNA sequencing performed using the Illumina MiSeq 2 x 300 bp platform (Univ. of Georgia) instruments_0_dataset_instrument_nid=813183 instruments_0_description=General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. instruments_0_instrument_name=Automated DNA Sequencer instruments_0_instrument_nid=649 instruments_0_supplied_name=Illumina MiSeq 2 x 300 bp platform metadata_source=https://www.bco-dmo.org/api/dataset/813173 Northernmost_Northing=-32.70567 param_mapping={'813173': {'Latitude': 'flag - latitude', 'Depth': 'flag - depth', 'Longitude': 'flag - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/813173/parameters people_0_affiliation=Woods Hole Oceanographic Institution people_0_affiliation_acronym=WHOI people_0_person_name=Virginia P. Edgcomb people_0_person_nid=51284 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI people_1_person_name=Virginia P. Edgcomb people_1_person_nid=51284 people_1_role=Contact people_1_role_type=related people_2_affiliation=Woods Hole Oceanographic Institution people_2_affiliation_acronym=WHOI BCO-DMO people_2_person_name=Karen Soenen people_2_person_nid=748773 people_2_role=BCO-DMO Data Manager people_2_role_type=related project=Subseafloor Lower Crust Microbiology projects_0_acronym=Subseafloor Lower Crust Microbiology projects_0_description=NSF abstract: The lower ocean crust has remained largely unexplored and represents one of the last frontiers for biological exploration on Earth. Preliminary data indicate an active subsurface biosphere in samples of the lower oceanic crust collected from Atlantis Bank in the SW Indian Ocean as deep as 790 m below the seafloor. Even if life exists in only a fraction of the habitable volume where temperatures permit and fluid flow can deliver carbon and energy sources, an active lower oceanic crust biosphere would have implications for deep carbon budgets and yield insights into microbiota that may have existed on early Earth. This is all of great interest to other research disciplines, educators, and students alike. A K-12 education program will capitalize on groundwork laid by outreach collaborator, A. Martinez, a 7th grade teacher in Eagle Pass, TX, who sailed as outreach expert on Drilling Expedition 360. Martinez works at a Title 1 school with ~98% Hispanic and ~2% Native American students and a high number of English Language Learners and migrants. Annual school visits occur during which the project investigators present hands on-activities introducing students to microbiology, and talks on marine microbiology, the project, and how to pursue science related careers. In addition, monthly Skype meetings with students and PIs update them on project progress. Students travel to the University of Texas Marine Science Institute annually, where they get a campus tour and a 3-hour cruise on the R/V Katy, during which they learn about and help with different oceanographic sampling approaches. The project partially supports two graduate students, a Woods Hole undergraduate summer student, the participation of multiple Texas A+M undergraduate students, and 3 principal investigators at two institutions, including one early career researcher who has not previously received NSF support of his own. Given the dearth of knowledge of the lower oceanic crust, this project is poised to transform our understanding of life in this vast environment. The project assesses metabolic functions within all three domains of life in this crustal biosphere, with a focus on nutrient cycling and evaluation of connections to other deep marine microbial habitats. The lower ocean crust represents a potentially vast biosphere whose microbial constituents and the biogeochemical cycles they mediate are likely linked to deep ocean processes through faulting and subsurface fluid flow. Atlantis Bank represents a tectonic

  11. Z

    Virus Pop Database V1

    • data.niaid.nih.gov
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kende, Julia; Bigot, Thomas (2023). Virus Pop Database V1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7867258
    Explore at:
    Dataset updated
    Apr 26, 2023
    Dataset provided by
    Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
    Authors
    Kende, Julia; Bigot, Thomas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This archive is a database generated using the novel Virus Pop pipeline, which simulates realistic protein sequences and adds new branches to a protein phylogenetic tree. An article describing the pipeline is currently under review.

    The database contains simulations of 995 different proteins from 93 virus genera, providing a total of 24,138,277 sequences, both in amino acid and nucleotide.

  12. Data from: Benchmarking tools for transcription factor prioritization

    • zenodo.org
    application/gzip
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Steinhauser; Sebastian Steinhauser; Leonor Schubert Santana; Gaulis Swann; Leonor Schubert Santana; Gaulis Swann (2024). Benchmarking tools for transcription factor prioritization [Dataset]. http://doi.org/10.5281/zenodo.10990183
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sebastian Steinhauser; Sebastian Steinhauser; Leonor Schubert Santana; Gaulis Swann; Leonor Schubert Santana; Gaulis Swann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 19, 2024
    Description

    Abstract:

    Spatiotemporal regulation of gene expression is controlled by transcription factor (TF) binding to regulatory elements, resulting in a plethora of cell types and cell states from the same genetic information. Due to the importance of regulatory elements, various sequencing methods have been developed to localise them in genomes, for example using ChIP-seq profiling of the histone mark H3K27ac that marks active regulatory regions. Moreover, multiple tools have been developed to predict TF binding to these regulatory elements based on DNA sequence. As altered gene expression is a hallmark of disease phenotypes, identifying TFs driving such gene expression programs is critical for the identification of novel drug targets.In this study, we curated 84 chromatin profiling experiments (H3K27ac ChIP-seq) where TFs were perturbed through e.g., genetic knockout or overexpression. We ran nine published tools to prioritize TFs using these real-world data sets and evaluated the performance of the methods in identifying the perturbed TFs. This allowed the nomination of three frontrunner tools, namely RcisTarget, MEIRLOP and monaLisa. Our analyses revealed opportunities and commonalities of tools that will help to guide further improvements and developments in the field.

    Dataset description:

    • tf_tool_benchmark_atacseq_diffPeaks.tar.gz -Archive containing differential peak statistics, tool diff peak input files (fore- and background) for all currated ATAC-seq datasets.
    • tf_tool_benchmark_h3K27ac_chipseq_diffPeaks.tar.gz - Archive containing differential peak statistics, tool diff peak input files (fore- and background) for all currated H3K27ac ChIP-seq datasets.
    • tf_tool_benchmark_atacseq_results.tar.gz - Archive containing the raw tool results for each ATAC-seq dataset.
    • tf_tool_benchmark_chipseq_results.tar.gz - Archive containing the raw tool results for each H3K27ac ChIP-seq dataset.
    • tf_tool_benchmark_results.tar.gz - Archive containing tool results summary for plotting (rds files).

    Contact: Sebastian Steinhauser - sebastian.steinhauser@novartis.com

  13. f

    Bioinformatics Summary statistics together with NCBI accession numbers.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tapia, Sebastián M.; Saenz-Agudelo, Pablo; Nespolo, Roberto F.; Villarroel, Carlos A.; Thompson, Dawn; Mikhalev, Ekaterina; Liti, Gianni; De Chiara, Matteo; Cubillos, Francisco A.; Urbina, Kamila; Mozzachiodi, Simone; Larrondo, Luis F.; Vega-Macaya, Franco; Oporto, Christian I. (2020). Bioinformatics Summary statistics together with NCBI accession numbers. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000455946
    Explore at:
    Dataset updated
    May 1, 2020
    Authors
    Tapia, Sebastián M.; Saenz-Agudelo, Pablo; Nespolo, Roberto F.; Villarroel, Carlos A.; Thompson, Dawn; Mikhalev, Ekaterina; Liti, Gianni; De Chiara, Matteo; Cubillos, Francisco A.; Urbina, Kamila; Mozzachiodi, Simone; Larrondo, Luis F.; Vega-Macaya, Franco; Oporto, Christian I.
    Description

    (A) Bioinformatics Summary statistics and (B) Sequence identity matrix between strains. (XLSX)

  14. Data from: AMAS: a fast tool for large alignment manipulation and computing...

    • search.datacite.org
    • data.niaid.nih.gov
    • +2more
    Updated 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marek L. Borowiec (2017). Data from: AMAS: a fast tool for large alignment manipulation and computing of summary statistics [Dataset]. http://doi.org/10.5061/dryad.p2q52
    Explore at:
    Dataset updated
    2017
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Dryad
    Authors
    Marek L. Borowiec
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python’s core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.

  15. Knowledge and attitudes among life scientists towards reproducibility within...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Aug 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evanthia Kaimaklioti Samota (2020). Knowledge and attitudes among life scientists towards reproducibility within journal articles_survey_datafile_raw_data [Dataset]. http://doi.org/10.6084/m9.figshare.7855592.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 11, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Evanthia Kaimaklioti Samota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw datafile of the survey data collected from the survey distributed to collect knowledge and attitudes among life scientists towards reproducibility within journal articles.

  16. f

    U-RVDBv15.1

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Feb 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bigot, Thomas; Eloit, Marc; Temmam, Sarah; Pérot, Philippe (2019). U-RVDBv15.1 [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000098034
    Explore at:
    Dataset updated
    Feb 21, 2019
    Authors
    Bigot, Thomas; Eloit, Marc; Temmam, Sarah; Pérot, Philippe
    Description

    Reference Viral Databases (RVDB-prot and RVDB-prot-HMM) were developed by Thomas Bigot in Marc Eloit’s Pathogen Discovery group in collaboration with Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI) at Institut Pasteur, for enhancing virus detection using next-generation sequencing (NGS) technologies. They are based on the reference Viral DataBase, courtesy of Arifa Khan’s group at CBER, FDA:https://hive.biochemistry.gwu.edu/rvdb/.They are updated after each new release of the nucleotidic database. The version number of the protein databases follows the one of the original nucleic database.

  17. m

    Dataset for: DNMT3A-R882 mutation intrinsically mimics maladaptive...

    • data.mendeley.com
    Updated Sep 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    giovanna mantica (2025). Dataset for: DNMT3A-R882 mutation intrinsically mimics maladaptive myelopoiesis from human haematopoietic stem cells [Dataset]. http://doi.org/10.17632/rcv6tkvbfy.1
    Explore at:
    Dataset updated
    Sep 18, 2025
    Authors
    giovanna mantica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports the manuscript:

    DNMT3A-R882 mutation intrinsically mimics maladaptive myelopoiesis from human haematopoietic stem cells

    Giovanna Mantica1*, Aditi Vedi1,2*, Amos Tuval3§, Hector Huerga-Encabo4§, Daniel Hayler1§, Aleksandra Krzywon1,5, Emily Mitchell6, William Dunn1, Tamir Biezuner3, Kendig Sham1, Antonella Santoro1, Joe Lee6, Adi Danin3, Noa Chapal3, Yoni Moskovitz3,7, Andrea Arruda8, Edoardo Fiorillo9, Valeria Orru9, Michele Marongiu9, Eoin McKinney10, Francesco Cucca9,11, Matthew Collin12, Mark Minden8, Peter Campbell6, George S Vassiliou1, Margarete Fabre1, Jyoti Nangalia1,6, Dominique Bonnet4, Liran Shlush3,7,8, Elisa Laurenti1

    • These authors contributed equally. § These authors contributed equally.

    Affiliations: 1 Department of Haematology and Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK. 2 Department of Paediatric Oncology, Cambridge University Hospitals NHS Foundation Trust 3 Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel. 4 Haematopoietic Stem Cell Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK 5 Department of Biostatistics and Bioinformatics, Maria Sklodowska-Curie National Research Institute of Oncology, Gliwice Branch, Gliwice, Poland 6 Wellcome Sanger Institute, Hinxton, CB10 1SA, UK 7 Division of Haematology Rambam Healthcare Campus, Haifa 31096, Israel. 8 Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada. 9 Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche, Lanusei, Italy. 10 Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK 11 Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy 12 Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK

  18. Data and code for: Pneumococcus co-colonization and the...

    • zenodo.org
    • search.dataone.org
    • +1more
    bin, csv
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ermanda Dekaj; Erida Gjini; Erida Gjini; Ermanda Dekaj (2024). Data and code for: Pneumococcus co-colonization and the stress-gradient-hypothesis [Dataset]. http://doi.org/10.5061/dryad.hqbzkh1p0
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ermanda Dekaj; Erida Gjini; Erida Gjini; Ermanda Dekaj
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Measurement technique
    <p>These data have been synthesized from studies that report <em>Streptococcus pneumoniae</em> colonization and co-colonization in children populations worldwide. We provide primary data files (metadata and extracted epidemiological variables as well as serotype compositions), processed data files, and some auxiliary R codes for analysis. The main purpose of our initial analyses was to investigate the stress-gradient-hypothesis in pneumococcus, and to link the mathematical modeling framework in previous papers (<em>Gjini and Madec, 2021; Madec and Gjini 2021</em>) to a concrete epidemiological context. </p> <ul> <li>Gjini, Erida, and Sten Madec. "The ratio of single to co‐colonization is key to complexity in interacting systems with multiple strains." <em>Ecology and Evolution</em> 11.13 (2021): 8456-8474. https://doi.org/10.1002/ece3.7259 </li> <li>Madec, Sten, and Erida Gjini. "Predicting N-strain coexistence from co-colonization interactions: epidemiology meets ecology and the replicator equation." <em>Bulletin of Mathematical Biology</em> 82.11 (2020): 142. https://doi.org/10.1007/s11538-020-00816-w</li> </ul>
    Description

    Pneumococcus serotype co-colonization, caused by the polymorphic bacteria Streptococcus pneumoniae, has been increasingly investigated and reported in recent years. Yet, there is limited information on how co-colonization patterns vary globally, critical for understanding the evolution and transmission dynamics of these bacteria. Here we report on a rich dataset of cross-sectional pneumococcal colonization studies collected from the literature, where we quantified patterns of transmission intensity and co-colonization variation in children populations across different epidemiological settings. Fitting these data to an SIS model with co-colonization under the assumption of quasi-neutrality among multiple interacting strains, our analysis reveals strong patterns of negative co-variation between transmission intensity R0 and susceptibility to co-colonization k, in support of the stress-gradient-hypothesis (SGH) in ecology. According to this hypothesis, ecological interactions between organisms shift positively as environmental stress increases. In our model higher environmental stress is represented via lower values of the basic reproduction number R0, and a shift towards positive interactions is represented via higher vulnerability to co-colonization (higher k) between pneumococcus serotypes.

  19. m

    Data from: From pattern to causality: using linear discriminant analysis and...

    • bridges.monash.edu
    • datasetcatalog.nlm.nih.gov
    • +1more
    pdf
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lin, Tsun-Chen; Liu, Ru-Sheng; Chen, Chien-Yu; Chao, Ya-Ting; Chen, Shu-Yuan (2017). From pattern to causality: using linear discriminant analysis and Bayesian network on microarray data of breast cancers [Dataset]. http://doi.org/10.4225/03/5a13729325a4e
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    Monash University
    Authors
    Lin, Tsun-Chen; Liu, Ru-Sheng; Chen, Chien-Yu; Chao, Ya-Ting; Chen, Shu-Yuan
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    In this paper, we aim at using genetic algorithms for gene selection and propose silhouette statistics as a discriminant function to classify breast cancers on microarray data for pattern discovery. In order to see the causality among these genes, we use the Bayesian method to construct a probability network for the pattern discovered. Consequently, we found a set of genes that is effective to discriminate breast cancer subtypes and present their probability dependencies to construct a diagnostic system. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  20. Z

    Dataset: Profiling Neuronal Methylome and Hydroxymethylome of Opioid Use...

    • data.niaid.nih.gov
    Updated Jul 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Rompala; Sheila T. Nagamatsu; José Jaime Martínez-Magaña; Diana L. Nuñez-Ríos; Jiawei Wang; Matthew J. Girgenti; John H. Krystal; Joel Gelernter; Traumatic Stress Brain Research Group; Yasmin L. Hurd; Janitza L. Montalvo-Ortiz (2023). Dataset: Profiling Neuronal Methylome and Hydroxymethylome of Opioid Use Disorder in the Human Orbitofrontal Cortex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7958289
    Explore at:
    Dataset updated
    Jul 11, 2023
    Dataset provided by
    Department of Psychiatry, Yale University School of Medicine, New Haven, CT; VA Connecticut Healthcare System, West Haven, CT; U.S. Department of Veterans Affairs National Center for Posttraumatic Stress Disorder, Clinical Neurosciences Division, West Haven, CT
    Department of Psychiatry, Yale University School of Medicine, New Haven, CT; U.S. Department of Veterans Affairs National Center for Posttraumatic Stress Disorder, Clinical Neurosciences Division, West Haven, CT
    Department of Psychiatry, Yale University School of Medicine, New Haven, CT; VA Connecticut Healthcare System, West Haven, CT
    Computational Biology and Bioinformatics Program, Yale University, New Haven, CT; Department of Biostatistics, Yale School of Public Health, New Haven, CT
    Icahn School of Medicine at Mount Sinai, New York, NY
    Authors
    Gregory Rompala; Sheila T. Nagamatsu; José Jaime Martínez-Magaña; Diana L. Nuñez-Ríos; Jiawei Wang; Matthew J. Girgenti; John H. Krystal; Joel Gelernter; Traumatic Stress Brain Research Group; Yasmin L. Hurd; Janitza L. Montalvo-Ortiz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Methylation and Hydroxymethylation data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mikhail G. Dozmorov (2023). Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF [Dataset]. http://doi.org/10.3389/fbioe.2018.00198.s001

Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Mikhail G. Dozmorov
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

Search
Clear search
Close search
Google apps
Main menu