100+ datasets found
  1. f

    Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail G. Dozmorov (2023). Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF [Dataset]. http://doi.org/10.3389/fbioe.2018.00198.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Mikhail G. Dozmorov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

  2. 2025 Green Card Report for Biostatistics, Bioinformatics, and Systems...

    • myvisajobs.com
    Updated Jan 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MyVisaJobs (2025). 2025 Green Card Report for Biostatistics, Bioinformatics, and Systems Biology [Dataset]. https://www.myvisajobs.com/reports/green-card/major/biostatistics,-bioinformatics,-and-systems-biology
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    MyVisaJobs.com
    Authors
    MyVisaJobs
    License

    https://www.myvisajobs.com/terms-of-service/https://www.myvisajobs.com/terms-of-service/

    Variables measured
    Major, Salary, Petitions Filed
    Description

    A dataset that explores Green Card sponsorship trends, salary data, and employer insights for biostatistics, bioinformatics, and systems biology in the U.S.

  3. d

    Two-step mixed model approach to analyzing differential alternative RNA...

    • datadryad.org
    • zenodo.org
    zip
    Updated Sep 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li Luo; Huining Kang; Xichen Li; Scott Ness; Christine Stidley (2020). Two-step mixed model approach to analyzing differential alternative RNA splicing: Datasets and R scripts for analysis of alternative splicing [Dataset]. http://doi.org/10.5061/dryad.66t1g1k0h
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 28, 2020
    Dataset provided by
    Dryad
    Authors
    Li Luo; Huining Kang; Xichen Li; Scott Ness; Christine Stidley
    Time period covered
    2020
    Description

    The dataset was collected through whole-transcriptome RNA-Sequencing technologies. The processing method was described in the manuscript.

  4. f

    Bioinformatics Training Resources

    • figshare.com
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Turner (2023). Bioinformatics Training Resources [Dataset]. http://doi.org/10.6084/m9.figshare.773083.v3
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Stephen Turner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Markdown source, PDF, and HTML rendering of bioinformatics training resources from http://stephenturner.us/p/edu.

  5. m

    NeonatalPortugal2018

    • data.mendeley.com
    Updated Dec 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Machado e Costa (2019). NeonatalPortugal2018 [Dataset]. http://doi.org/10.17632/br8tnh3h47.1
    Explore at:
    Dataset updated
    Dec 7, 2019
    Authors
    Francisco Machado e Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Portuguese National Registry on low weight newborns between 2013 and 2018, made available for research purposes. Dataset is composed of 3823 unique entries registering birthweight, biological sex of the infant (1-Male; 2-Female), CRIB score (0-21) and survival (0-Survival; 1-Death).

  6. Datasets associated with the publication of the "satuRn" R package

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeroen Gilis; Jeroen Gilis; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup; Koen Van den Berge; Koen Van den Berge; Lieven Clement; Lieven Clement (2022). Datasets associated with the publication of the "satuRn" R package [Dataset]. http://doi.org/10.5281/zenodo.4438474
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeroen Gilis; Jeroen Gilis; Kristoffer Vitting-Seerup; Kristoffer Vitting-Seerup; Koen Van den Berge; Koen Van den Berge; Lieven Clement; Lieven Clement
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    On this Zenodo link, we share the data that is required to reproduce all the analyses from our publication "satuRn: Scalable Analysis of differential Transcript Usage for bulk and single-cell RNA-sequencing applications".

    This repository includes input transcript-level expression matrices and metadata for all datasets, as well as intermediate results and final outputs of the respective DTU analyses. For a more elaborate description of the data, we refer to the companion GitHub for our publications; https://github.com/statOmics/satuRnPaper. Note that this is version 1.0.0 of the data (uploaded on 2021-01-14). If any changes were to be made to the datasets in the future, this will also be communicated on our companion GitHub page.

  7. d

    Multidimensional scaling informed by F-statistic: Visualizing microbiome for...

    • dataone.org
    • data.niaid.nih.gov
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie (2025). Multidimensional scaling informed by F-statistic: Visualizing microbiome for inference [Dataset]. http://doi.org/10.5061/dryad.vmcvdnd3x
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Hyungseok Kim; Soobin Kim; Jeff Kimbrel; Megan Morris; Xavier Mayali; Cullen Buie
    Description

    Multidimensional scaling (MDS) is a dimensionality reduction technique for microbial ecology data analysis that represents the multivariate structure while preserving pairwise distances between samples. While its improvements have enhanced the ability to reveal data patterns by sample groups, these MDS-based methods require prior assumptions for inference, limiting their application in general microbiome analysis. In this study, we introduce a new MDS-based ordination, “F-informed MDS,†which configures the data distribution based on the F-statistic, the ratio of dispersion between groups sharing common and different characteristics. Using simulated compositional datasets, we demonstrate that the proposed method is robust to hyperparameter selection while maintaining statistical significance throughout the ordination process. Various quality metrics for evaluating dimensionality reduction confirm that F-informed MDS is comparable to state-of-the-art methods in preserving both local and ..., , # Multidimensional scaling informed by F-statistic: Visualizing grouped microbiome data with inference

    File: Data.zip

    Description:Â Raw data used in this study. Includes 3 folders and 1 file (see below).
    1. Folder Simulated contains pairwise distances and ordination results from three simulated datasets. Includes 7 subfolders and 6 files.
      • Six files are the original dataset and its associated labels set. The names are formatted as "sim_<*x*>-<*type*>.*csv*" where <*x*> is the replicate number and <*type*> indicates whether the file is the design matrix ("data") or response vector ("Y").
      • Seven subfolders are grouped by the ordination method. Likewise, the file ...,
  8. m

    SARS-CoV-2 GISAID UK-US isolates (2020-09-07) genotyping VCF

    • data.mendeley.com
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Necla Koçhan (2020). SARS-CoV-2 GISAID UK-US isolates (2020-09-07) genotyping VCF [Dataset]. http://doi.org/10.17632/5dfj2hhnng.1
    Explore at:
    Dataset updated
    Nov 16, 2020
    Authors
    Necla Koçhan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, United Kingdom
    Description

    VCF files containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID EpiCoV and submitted from the UK and the US, separated by individual mutations. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, identified mutation, quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or the mutant indicated in the identified mutation column (1). The files is tab delimited, with the UK file having 12696 rows including the names, and 18135 columns, and the US file having 15588 rows including the names, and 16277 columns.

    The file was generated to test the hypothesis whether the different SARS-CoV-2 genes or protein coding regions are positively or negatively selected differently between 14408C>T / 23403A>G double mutants and double wildtype isolates, using mutation rate models, and whether regional distributions affect the mutation rates. Our findings have shown that the RdRp coding region and the S gene show the highest amount of selection across viral generations, and that different countries can affect the synonymous and nonsynonymous mutation rates for individual genes.

  9. m

    Prediction of Heart Attack

    • data.mendeley.com
    Updated Aug 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakin Sad Aftab (2024). Prediction of Heart Attack [Dataset]. http://doi.org/10.17632/yrwd336rkz.2
    Explore at:
    Dataset updated
    Aug 21, 2024
    Authors
    Rakin Sad Aftab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of 1763 observations, each representing a unique patient, and 12 different attributes associated with heart disease. This dataset is a critical resource for researchers focusing on predictive analytics in cardiovascular diseases.

    Variables Overview: 1. Age: A continuous variable indicating the age of the patient. 2. Sex: A categorical variable with two levels ('Male', 'Female'), indicating the gender of the patient. 3. CP (Chest Pain type): A categorical variable describing the type of chest pain experienced by the patient, with categories such as 'Asymptomatic', 'Atypical Angina', 'Typical Angina', and 'Non-Angina'. 4. TRTBPS (Resting Blood Pressure): A continuous variable indicating the resting blood pressure (in mm Hg) on admission to the hospital. 5. Chol (Serum Cholesterol): A continuous variable measuring the serum cholesterol in mg/dl. 6. FBS (Fasting Blood Sugar): A binary variable where 1 represents fasting blood sugar > 120 mg/dl, and 0 otherwise. 7. Rest ECG (Resting Electrocardiographic Results): Categorizes the resting electrocardiographic results of the patient into 'Normal', 'ST Elevation', and other categories. 8. Thalachh (Maximum Heart Rate Achieved): A continuous variable indicating the maximum heart rate achieved by the patient. 9. Exng (Exercise Induced Angina): A binary variable where 1 indicates the presence of exercise-induced angina, and 0 otherwise. 10. Oldpeak (ST Depression Induced by Exercise Relative to Rest): A continuous variable indicating the ST depression induced by exercise relative to rest. 11. Slope (Slope of the Peak Exercise ST Segment): A categorical variable with levels such as 'Flat', 'Up Sloping', representing the slope of the peak exercise ST segment. 14. Target: A binary target variable indicating the presence (1) or absence (0) of heart disease.

    Descriptive Statistics: The patients' age ranges from 29 to 77 years, with a mean age of approximately 54 years. The resting blood pressure spans from 94 to 200 mm Hg, and the average cholesterol level is about 246 mg/dl. The maximum heart rate achieved varies widely among patients, from 71 to 202 beats per minute.

    Importance for Research: This dataset provides a comprehensive view of various factors that could potentially be linked to heart disease, making it an invaluable resource for developing predictive models. By analyzing relationships and patterns within these variables, researchers can identify key predictors of heart disease and enhance the accuracy of diagnostic tools. This could lead to better preventive measures and treatment strategies, ultimately improving patient outcomes in the realm of cardiovascular health

  10. r

    expam Benchmarking - Classifier Performance Statistics

    • researchdata.edu.au
    • bridges.monash.edu
    Updated Jun 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster (2022). expam Benchmarking - Classifier Performance Statistics [Dataset]. http://doi.org/10.26180/19771072.v1
    Explore at:
    Dataset updated
    Jun 28, 2022
    Dataset provided by
    Monash University
    Authors
    Sean Solari; Remy Young; Vanessa Marcelino; Sam Forster
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Excel document containing precision, recall and F1 scores for metagenomic classifiers used in the benchmarking of expam's performance. Classifiers were tested on 140 simulated metagenomic communities, at different taxonomic ranks.

  11. Data and datasets for analysis and plotting

    • figshare.com
    zip
    Updated Apr 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Almeida (2022). Data and datasets for analysis and plotting [Dataset]. http://doi.org/10.6084/m9.figshare.19371008.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 10, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    José Almeida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contains most of the necessary files for running the analysis and plotting scripts.Please check analysis-plotting in https://github.com/josegcpa/wbs-prediction for more details.

  12. f

    Data from: Average salary

    • froghire.ai
    Updated Apr 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FrogHire.ai (2025). Average salary [Dataset]. https://www.froghire.ai/major/Bioinformatics%20And%20Statistics
    Explore at:
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    FrogHire.ai
    Description

    Explore the progression of average salaries for graduates in Bioinformatics And Statistics from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Bioinformatics And Statistics relative to other fields. This data is essential for students assessing the return on investment of their education in Bioinformatics And Statistics, providing a clear picture of financial prospects post-graduation.

  13. f

    Prophage statistics

    • open.flinders.edu.au
    • researchdata.edu.au
    application/gzip
    Updated Mar 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Edwards (2023). Prophage statistics [Dataset]. http://doi.org/10.25451/flinders.22268722.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Mar 19, 2023
    Dataset provided by
    Flinders University
    Authors
    Robert Edwards
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The presence of prophages in bacterial genomes.

    This file has these columns: 0. GENOMEID - Genbank genome assembly accession 1. Genome Name - Definition of the genome in the genbank file 2. Contigs > 5kb - Number of contigs longer than 5 kb (only these were used to predict prophages) 3. Genome Contigs - Total number of contigs in the genome 4. Number of Coding Sequences - Total number of coding sequences in the genome 5. Too short - Number of phage predictions that were too short (less than 5 genes in the prediction) 6. Not enough phage hits - Number of phage predictions that did not have a single HMM match to VOGdb version 99 7. Kept - Number of high quality prophage predictions 8. Note - Outcome of the computation. You should read this column, especially if the sum of prophage predictions is zero

  14. o

    Introduction to Bayesian statistics with R

    • explore.openaire.eu
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Kuipers; Wandrille Duchemin (2022). Introduction to Bayesian statistics with R [Dataset]. http://doi.org/10.5281/zenodo.8070046
    Explore at:
    Dataset updated
    Dec 6, 2022
    Authors
    Jack Kuipers; Wandrille Duchemin
    Description

    Content of the Introduction to Bayesian statistics SIB course of May 2023

  15. E

    [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C:...

    • erddap.bco-dmo.org
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2020). [IODP360 - iTAG and metatranscriptome data] - Supplementary Table 4C: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. (Collaborative Research: Delineating The Microbial Diversity and Cross-domain Interactions in The Uncharted Subseafloor Lower Crust Using Meta-omics and Culturing Approaches) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_813173/index.html
    Explore at:
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/813173/licensehttps://www.bco-dmo.org/dataset/813173/license

    Area covered
    Variables measured
    depth, iTAG_OTU, iTAG_Raw, latitude, Sample_ID, longitude, Metatr_Raw, iTAG_Paired_QC, Metatr_Paired_QC, Metatr_Reads_Remaining, and 2 more
    Description

    Supplementary Table 4C: Metatranscriptome data summary for cellular activities presented and statistics on sequencing and removal of potential contaminant sequences: Statistics of reads retained through bioinformatic processing of iTAG data for the 11 samples and control samples and metatranscriptome data. Samples taken on board of the R/V JOIDES Resolution between November 30, 2015 and January 30, 2016 access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=Rock material was crushed while still frozen in a Progressive Exploration Jaw Crusher (Model 150) whose surfaces were sterilized with 70% ethanol and RNase AWAY (Thermo Fisher Scientific, USA) inside a laminar flow hood. Powdered rock material was returned to the -80\u00b0C freezer until extraction.

    DNA was extracted from 20, 30, or 40 grams of powdered rock material, depending on the quantity of rock available. A DNeasy PowerMax Soil Kit (Qiagen, USA) was used following the manufacturer\u2019s protocol modified to included three freeze/thaw treatments prior to the addition of Soil Kit solution C1. Each treatment consisted of 1 minute in liquid nitrogen followed by 5 minutes at 65 \u00b0C. DNA extracts were concentrated by isopropanol precipitation overnight at 4\u00b0C.

    The low biomass in our samples required whole genome amplification (WGA) prior to PCR amplification of marker genes. Genomic DNA was amplified by Multiple Displacement Amplification (MDA) using the REPLI-g Single Cell Kit (Qiagen) as directed. MDA bias was minimized by splitting each WGA sample into triplicate 16 \u03bcL reactions after 1 hr of amplification and then resuming amplification for the manufacturer-specified 7 hrs (8 hrs total).

    DNA was also recovered from samples of drilling mud and drilling fluid (surface water collected during the coring process) for negative controls, as well as two \u201ckit control\u201d samples, in which no sample was added, to account for any contaminants originating from either the DNeasy PowerMax Soil Kit or the REPLI-g Single Cell Kit.

    Bacterial SSU rRNA gene fragments were PCR amplified from MDA samples and sequenced at Georgia Genomics and Bioinformatics Core (Univ. of Georgia). The primers used were: Bac515-Y and Bac926R. Dual-indexed libraries were prepared with (HT) iTruS (Kappa Biosystems) chemistry and sequencing was performed on an Illumina MiSeq 2 x 300 bp system with all samples combined equally on a single flow cell.

    Raw sequence reads were processed through Trim Galore [http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/], FLASH (ccb.jhu.edu/software/FLASH/) and FASTX Toolkit [http://hannonlab.cshl.edu/fastx_toolkit/] for trimming and removal of low quality/short reads.

    Quality filtering included requiring a minimum average quality of 25 and rejection of paired reads less than 250 nucleotides.

    Operational Taxonomic Unit (OTU) clusters were constructed at 99% similarity with the script pick_otus.py within the Quantitative Insights Into Microbial Ecology (QIIME) v.1.9.1 software and \u2018uclust\u2019. Any OTU that matched an OTU in one of our control samples (drilling fluids, drilling mud, extraction and WGA controls) was removed (using filter_otus_from_otu_table.py) along with any sequences of land plants and human pathogens that may have survived the control filtering due to clustering at 99% (filter_taxa_from_otu_table.py). As an additional quality control measure, genera that are commonly identified as PCR contaminants were removed. Unclassified OTUs were queried using BLAST against the GenBank nr database and further information about these OTUs is provided in the Supplementary Discussion text under the section \u201cTaxonomic diversity information from iTAGs.\u201d OTUs that could not be assigned to Bacteria or Archaea were removed from further analysis. For downstream analyses, any OTUs not representing more than 0.01% of relative abundance of sequences overall were removed as those are unlikely to contribute significantly to in situ communities. The OTU data table was transformed to a presence/absence table and the Jaccard method was used to generate a distance matrix using the dist.binary() function in the R package ade4. awards_0_award_nid=709555 awards_0_award_number=OCE-1658031 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1658031 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=Supplementary Table 4C: iTAG PI: Virginia Edgcomb
    Data Version 1: 2020-05-28 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 dataset_current_state=Final and no updates defaultDataQuery=&time<now doi=10.26008/1912/bco-dmo.813173.1 Easternmost_Easting=57.278183 geospatial_lat_max=-32.70567 geospatial_lat_min=-32.70567 geospatial_lat_units=degrees_north geospatial_lon_max=57.278183 geospatial_lon_min=57.278183 geospatial_lon_units=degrees_east geospatial_vertical_max=747.7 geospatial_vertical_min=10.7 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/813173 institution=BCO-DMO instruments_0_acronym=Automated Sequencer instruments_0_dataset_instrument_description=DNA sequencing performed using the Illumina MiSeq 2 x 300 bp platform (Univ. of Georgia) instruments_0_dataset_instrument_nid=813183 instruments_0_description=General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. instruments_0_instrument_name=Automated DNA Sequencer instruments_0_instrument_nid=649 instruments_0_supplied_name=Illumina MiSeq 2 x 300 bp platform metadata_source=https://www.bco-dmo.org/api/dataset/813173 Northernmost_Northing=-32.70567 param_mapping={'813173': {'Latitude': 'flag - latitude', 'Depth': 'flag - depth', 'Longitude': 'flag - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/813173/parameters people_0_affiliation=Woods Hole Oceanographic Institution people_0_affiliation_acronym=WHOI people_0_person_name=Virginia P. Edgcomb people_0_person_nid=51284 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=Woods Hole Oceanographic Institution people_1_affiliation_acronym=WHOI people_1_person_name=Virginia P. Edgcomb people_1_person_nid=51284 people_1_role=Contact people_1_role_type=related people_2_affiliation=Woods Hole Oceanographic Institution people_2_affiliation_acronym=WHOI BCO-DMO people_2_person_name=Karen Soenen people_2_person_nid=748773 people_2_role=BCO-DMO Data Manager people_2_role_type=related project=Subseafloor Lower Crust Microbiology projects_0_acronym=Subseafloor Lower Crust Microbiology projects_0_description=NSF abstract: The lower ocean crust has remained largely unexplored and represents one of the last frontiers for biological exploration on Earth. Preliminary data indicate an active subsurface biosphere in samples of the lower oceanic crust collected from Atlantis Bank in the SW Indian Ocean as deep as 790 m below the seafloor. Even if life exists in only a fraction of the habitable volume where temperatures permit and fluid flow can deliver carbon and energy sources, an active lower oceanic crust biosphere would have implications for deep carbon budgets and yield insights into microbiota that may have existed on early Earth. This is all of great interest to other research disciplines, educators, and students alike. A K-12 education program will capitalize on groundwork laid by outreach collaborator, A. Martinez, a 7th grade teacher in Eagle Pass, TX, who sailed as outreach expert on Drilling Expedition 360. Martinez works at a Title 1 school with ~98% Hispanic and ~2% Native American students and a high number of English Language Learners and migrants. Annual school visits occur during which the project investigators present hands on-activities introducing students to microbiology, and talks on marine microbiology, the project, and how to pursue science related careers. In addition, monthly Skype meetings with students and PIs update them on project progress. Students travel to the University of Texas Marine Science Institute annually, where they get a campus tour and a 3-hour cruise on the R/V Katy, during which they learn about and help with different oceanographic sampling approaches. The project partially supports two graduate students, a Woods Hole undergraduate summer student, the participation of multiple Texas A+M undergraduate students, and 3 principal investigators at two institutions, including one early career researcher who has not previously received NSF support of his own. Given the dearth of knowledge of the lower oceanic crust, this project is poised to transform our understanding of life in this vast environment. The project assesses metabolic functions within all three domains of life in this crustal biosphere, with a focus on nutrient cycling and evaluation of connections to other deep marine microbial habitats. The lower ocean crust represents a potentially vast biosphere whose microbial constituents and the biogeochemical cycles they mediate are likely linked to deep ocean processes through faulting and subsurface fluid flow. Atlantis Bank represents a tectonic

  16. u

    Statistics of the chloroplast genome sequencing data

    • figshare.unimelb.edu.au
    pdf
    Updated Jul 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenxi Zhou (2020). Statistics of the chloroplast genome sequencing data [Dataset]. http://doi.org/10.26188/12652067.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 14, 2020
    Dataset provided by
    The University of Melbourne
    Authors
    Chenxi Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Statistics of the total DNA sequencing data, the raw cpDNA sequence data extracted from the total DNA sequence data and the processed cpDNA sequence data after trimming (for Illumina reads) or error correction (for Nanopore reads).

  17. f

    fMRI - Lecture 8

    • figshare.com
    • search.datacite.org
    pdf
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anders Eklund (2016). fMRI - Lecture 8 [Dataset]. http://doi.org/10.6084/m9.figshare.1461650.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Authors
    Anders Eklund
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lecture about machine learning in fMRI.

  18. r

    Supplementary Files for thesis titled "Visual-analytics-driven...

    • researchdata.edu.au
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaur Sandeep (2022). Supplementary Files for thesis titled "Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data" [Dataset]. https://researchdata.edu.au/supplementary-files-thesis-biomolecular-data/2089386
    Explore at:
    Dataset updated
    2022
    Dataset provided by
    University of New South Wales
    UNSW, Sydney
    Authors
    Kaur Sandeep
    License

    https://www.gnu.org/licenses/gpl-3.0.en.htmlhttps://www.gnu.org/licenses/gpl-3.0.en.html

    Description

    This data set provides Supplementary files referenced in the thesis titled "Visual-analytics-driven bioinformatics methods for the analysis of biomolecular data".

    In particular, this data set consists of the following files (Details are also provided in an included README.txt file):

    Description of files in this data set:

    1. Supplementary File 4.1. Supplementary File 4.1 - URL and variants schema.pdf. Graphical Backus-Naur schema of the variant syntax recognized by Aquaria.
    2. Supplementary File 4.2. Supplementary File 4.2 - Schema.json. Aquaria feature set schema. This schema can be utilized in conjunction with user-specified JSON files for validation in online tools such as https://www.jsonschemavalidator.net/ (see Section 4.5.5).
    3. Supplementary File 6.1. Supplementary File 6.1 - Illumina and complete genome IDs.xlsx. NCBI SRA accession identifiers of 673 Illumina (short-read length) and 673 PacBio sequenced genomes (long-read length), corresponding to 673 isolates sequenced using two technologies.
    4. Supplementary File 6.2. Supplementary File 6.2 - Distribution of IS in complete genomes.xlsx. ISs in complete genomes.
    5. Supplementary File 6.3. Supplementary File 6.3 - QUAST analysis of assemblies.xlsx. Summary of SPAdes and SKESA assembly quality statistics, generated using QUAST.
    6. Supplementary File 6.4. Supplementary File 6.4 - WiIS performance metrics.xlsx. WiIS performance metrics for each genome.
    7. Supplementary File 6.5. Supplementary File 6.5 - Correlation of performance metrics and assembly statistics.xlsx. Correlation of WiIS performance metrics with SPAdes and SKESA assembly quality statistics.
    8. Supplementary File 6.6. Supplementary File 6.6 - IS insertions found by all tools.xlsx. IS insertions found by all tools for each of the 673 short-read sequenced genome.
    9. Supplementary File 6.7. Supplementary File 6.7 - IS insertions found by all tools (20 base pair distance threshold).xlsx. IS insertions found by all tools, with a buffer length of 20 base pairs, for each of the 673 short-read sequenced genome.
    10. Supplementary File 6.8. Supplementary File 6.8 - WiIS SPAdes IS insertions found with respect.xlsx. Summary of IS insertions found by WiIS (SPAdes) with respect to Tohama I (including the counts of insertions identified by WiIS, but not in Tohama I).
    11. Supplementary File 6.9. Wiis.zip. WiIS code.
  19. f

    Data from: Average salary

    • froghire.ai
    Updated Apr 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FrogHire.ai (2025). Average salary [Dataset]. https://www.froghire.ai/major/Health%20Science%20In%20Biostatistics%20-%20Bioinformatics%20Track
    Explore at:
    Dataset updated
    Apr 6, 2025
    Dataset provided by
    FrogHire.ai
    Description

    Explore the progression of average salaries for graduates in Health Science In Biostatistics - Bioinformatics Track from 2020 to 2023 through this detailed chart. It compares these figures against the national average for all graduates, offering a comprehensive look at the earning potential of Health Science In Biostatistics - Bioinformatics Track relative to other fields. This data is essential for students assessing the return on investment of their education in Health Science In Biostatistics - Bioinformatics Track, providing a clear picture of financial prospects post-graduation.

  20. f

    Data from: Leveraging two-way probe-level block design for identifying...

    • figshare.com
    zip
    Updated Nov 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yingyao Zhou (2021). Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays [Dataset]. http://doi.org/10.6084/m9.figshare.16917385.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 2, 2021
    Dataset provided by
    figshare
    Authors
    Yingyao Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the companion web site for publication:Barrera L, Benner C, Tao YC, Winzeler E, Zhou Y. Leveraging two-way probe-level block design for identifying differential gene expression with high-density oligonucleotide arrays. BMC Bioinformatics. 2004 Apr 20;5:42. doi: 10.1186/1471-2105-5-42. PMID: 15099405; PMCID: PMC411067.Download and unzip the file, open ProbeStatistics/index.html to browse the self-contained web site.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mikhail G. Dozmorov (2023). Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF [Dataset]. http://doi.org/10.3389/fbioe.2018.00198.s001

Data_Sheet_1_GitHub Statistics as a Measure of the Impact of Open-Source Bioinformatics Software.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Mikhail G. Dozmorov
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Modern research is increasingly data-driven and reliant on bioinformatics software. Publication is a common way of introducing new software, but not all bioinformatics tools get published. Giving there are competing tools, it is important not merely to find the appropriate software, but have a metric for judging its usefulness. Journal's impact factor has been shown to be a poor predictor of software popularity; consequently, focusing on publications in high-impact journals limits user's choices in finding useful bioinformatics tools. Free and open source software repositories on popular code sharing platforms such as GitHub provide another venue to follow the latest bioinformatics trends. The open source component of GitHub allows users to bookmark and copy repositories that are most useful to them. This Perspective aims to demonstrate the utility of GitHub “stars,” “watchers,” and “forks” (GitHub statistics) as a measure of software impact. We compiled lists of impactful bioinformatics software and analyzed commonly used impact metrics and GitHub statistics of 50 genomics-oriented bioinformatics tools. We present examples of community-selected best bioinformatics resources and show that GitHub statistics are distinct from the journal's impact factor (JIF), citation counts, and alternative metrics (Altmetrics, CiteScore) in capturing the level of community attention. We suggest the use of GitHub statistics as an unbiased measure of the usability of bioinformatics software complementing the traditional impact metrics.

Search
Clear search
Close search
Google apps
Main menu