100+ datasets found
  1. Introductions to Bioinformatics

    • figshare.com
    pdf
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aidan Budd (2016). Introductions to Bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.830401.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Aidan Budd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of similar but different presentations I've made aimed at introducing bioinformatics to bench biologists.

  2. Dataset for practice session 1 in bioinformatics

    • figshare.com
    txt
    Updated Jul 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Sugis (2016). Dataset for practice session 1 in bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.3490211.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 17, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Elena Sugis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for the practice in the data preprocessing and unsupervised learning in the introduction to bioinformatics course

  3. q

    Bioinformatics: An Interactive Introduction to NCBI

    • qubeshub.org
    Updated Jan 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seth Bordenstein (2019). Bioinformatics: An Interactive Introduction to NCBI [Dataset]. http://doi.org/10.25334/Q4915C
    Explore at:
    Dataset updated
    Jan 3, 2019
    Dataset provided by
    QUBES
    Authors
    Seth Bordenstein
    Description

    Modules showing how the NCBI database classifies and organizes information on DNA sequences, evolutionary relationships, and scientific publications. And a module working to identify a nucleotide sequence from an insect endosymbiont by using BLAST

  4. f

    Data from: “Bioinformatics: Introduction and Methods,” a Bilingual Massive...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Dec 11, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng, Yuqi; Wei, Liping; Gao, Ge; Yang, Xiaoxu; He, Yao; Ding, Yang; Liu, Fenglin; Ye, Adam Yongxin; Wang, Meng (2014). “Bioinformatics: Introduction and Methods,” a Bilingual Massive Open Online Course (MOOC) as a New Example for Global Bioinformatics Education [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001209841
    Explore at:
    Dataset updated
    Dec 11, 2014
    Authors
    Meng, Yuqi; Wei, Liping; Gao, Ge; Yang, Xiaoxu; He, Yao; Ding, Yang; Liu, Fenglin; Ye, Adam Yongxin; Wang, Meng
    Description

    “Bioinformatics: Introduction and Methods,” a Bilingual Massive Open Online Course (MOOC) as a New Example for Global Bioinformatics Education

  5. q

    Hemoglobin bioinformatics

    • qubeshub.org
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keith Johnson (2021). Hemoglobin bioinformatics [Dataset]. http://doi.org/10.25334/MMEY-8321
    Explore at:
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    QUBES
    Authors
    Keith Johnson
    Description

    This is an introduction to bioinformatics using hemoglobin as an example. The worksheets introduce students to resources to explore the DNA, RNA and polypeptide linear structure with a brief introduction to the quaternary structure of hemoglobin.

  6. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  7. f

    Comparison of the multiple-delivery-mode training model employed by...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lennard, Katie; Aron, Shaun; Panji, Sumir; Kennedy, Dane; Mulder, Nicola; Allali, Imane; Fields, Christopher J; Ras, Verena; Mwaikono, Kilaza Samson; Rendon, Gloria; Claassen-Weitz, Shantelle; Holmes, Jessica R.; Botha, Gerrit (2021). Comparison of the multiple-delivery-mode training model employed by H3ABioNet’s Introduction to Bioinformatics (IBT) course and the 16s rRNA Microbiome Intermediate Bioinformatics Training course (16S). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000897705
    Explore at:
    Dataset updated
    Feb 25, 2021
    Authors
    Lennard, Katie; Aron, Shaun; Panji, Sumir; Kennedy, Dane; Mulder, Nicola; Allali, Imane; Fields, Christopher J; Ras, Verena; Mwaikono, Kilaza Samson; Rendon, Gloria; Claassen-Weitz, Shantelle; Holmes, Jessica R.; Botha, Gerrit
    Description

    The table provides a short description of the major components of the model employed by each course, highlighting any differences between the two (deviations are indicated by an asterisk (*)).

  8. u

    IBT Linux Session 2 Plasmodium file

    • zivahub.uct.ac.za
    txt
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumir Panji (2025). IBT Linux Session 2 Plasmodium file [Dataset]. http://doi.org/10.25375/uct.28915670.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset provided by
    University of Cape Town
    Authors
    Sumir Panji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File used for the Introduction to bioinformatics (IBT) Linux practical session course.

  9. q

    Teaching introductory bioinformatics with Jupyter notebook-based active...

    • qubeshub.org
    Updated Aug 17, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Dewey (2019). Teaching introductory bioinformatics with Jupyter notebook-based active learning [Dataset]. http://doi.org/10.25334/YZJ7-D347
    Explore at:
    Dataset updated
    Aug 17, 2019
    Dataset provided by
    QUBES
    Authors
    Colin Dewey
    Description

    Presentation on teaching introductory bioinformatics with Jupyter notebook-based active learning at the 2019 Great Lakes Bioinformatics Conference

  10. exampleVCFfiles

    • kaggle.com
    zip
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omer Faruk Isler (2025). exampleVCFfiles [Dataset]. https://www.kaggle.com/omerfarukisler/examplevcffiles
    Explore at:
    zip(24351800 bytes)Available download formats
    Dataset updated
    Jan 10, 2025
    Authors
    Omer Faruk Isler
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    all vcf files that I was able to provide in BLG348 Intro to Bioinformatics course term project. Mutect variantCaller didn't work properly so I didn't add them. NotFıltered vcf's indicates previos version of vcf's that contains different filters (not only PASS ones) You can also check my profile to see the plots that I used for my project report & presentation.

  11. f

    Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce...

    • frontiersin.figshare.com
    pdf
    Updated Jun 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine Lynn Petrie; Rujia Xie (2023). Data_Sheet_2_Resequencing of Microbial Isolates: A Lab Module to Introduce Novices to Command-Line Bioinformatics.PDF [Dataset]. http://doi.org/10.3389/fmicb.2021.578859.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    Frontiers
    Authors
    Katherine Lynn Petrie; Rujia Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Familiarity with genome-scale data and the bioinformatic skills to analyze it have become essential for understanding and advancing modern biology and human health, yet many undergraduate biology majors are never exposed to hands-on bioinformatics. This paper presents a module that introduces students to applied bioinformatic analysis within the context of a research-based microbiology lab course. One of the most commonly used genomic analyses in biology is resequencing: determining the sequence of DNA bases in a derived strain of some organism, and comparing it to the known ancestral genome of that organism to better understand the phenotypic differences between them. Many existing CUREs — Course Based Undergraduate Research Experiences — evolve or select new strains of bacteria and compare them phenotypically to ancestral strains. This paper covers standardized strategies and procedures, accessible to undergraduates, for preparing and analyzing microbial whole-genome resequencing data to examine the genotypic differences between such strains. Wet-lab protocols and computational tutorials are provided, along with additional guidelines for educators, providing instructors without a next-generation sequencing or bioinformatics background the necessary information to incorporate whole-genome sequencing and command-line analysis into their class. This module introduces novice students to running software at the command-line, giving them exposure and familiarity with the types of tools that make up the vast majority of open-source scientific software used in contemporary biology. Completion of the module improves student attitudes toward computing, which may make them more likely to pursue further bioinformatics study.

  12. Introduction to the UCSC Genome Browser

    • figshare.com
    application/cdfv2
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mary Mangan (2023). Introduction to the UCSC Genome Browser [Dataset]. http://doi.org/10.6084/m9.figshare.96258.v1
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mary Mangan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introductory slides for the UCSC Genome Browser. Part of a set of materials available for training on the UCSC tools. Also available is a recording of the same material as a video. Exercises to practice additional skills can also be used for the training. The full training suite is available: http://openhelix.com/ucsc and there is an additional set of materials with more advanced topics: http://www.openhelix.com/ucscadv . BTW: there is a full script in the "notes" area of the slides, but that is not visible in the viewer.

  13. h

    Bioinformatics

    • huggingface.co
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Benchmark for Reasoning-Driven Medical Retrieval (2025). Bioinformatics [Dataset]. https://huggingface.co/datasets/R2MED/Bioinformatics
    Explore at:
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    A Benchmark for Reasoning-Driven Medical Retrieval
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    🔭 Overview

      R2MED: First Reasoning-Driven Medical Retrieval Benchmark
    

    R2MED is a high-quality, high-resolution synthetic information retrieval (IR) dataset designed for medical scenarios. It contains 876 queries with three retrieval tasks, five medical scenarios, and twelve body systems.

    Dataset

    Q

    D

    Avg. Pos Q-Len D-Len

    Biology 103 57359 3.6 115.2 83.6

    Bioinformatics77 47473 2.9 273.8 150.5

    Medical Sciences 88 34810 2.8 107.1 122.7

    MedXpertQA-Exam 97… See the full description on the dataset page: https://huggingface.co/datasets/R2MED/Bioinformatics.

  14. q

    Data from: A Critical Guide to the PDB

    • qubeshub.org
    Updated Dec 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresa Attwood; GOBLET Foundation (2020). A Critical Guide to the PDB [Dataset]. http://doi.org/10.25334/EKRH-2C94
    Explore at:
    Dataset updated
    Dec 5, 2020
    Dataset provided by
    QUBES
    Authors
    Teresa Attwood; GOBLET Foundation
    Description

    This Critical Guide in the Introduction to Bioinformatics series provides a brief outline of the Protein Data Bank – the PDB – the world’s primary repository of biological macromolecular structures.

  15. z

    Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clemens Schmid; Clemens Schmid (2024). Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction to R and the Tidyverse [Dataset]. http://doi.org/10.5281/zenodo.13758879
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    SPAAM Community
    Authors
    Clemens Schmid; Clemens Schmid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and conda software environment file for the chapter 'Introduction to R and the Tidyverse' of the SPAAM Community's textbook: Introduction to Ancient Metagenomics (https://www.spaam-community.org/intro-to-ancient-metagenomics-book).

  16. w

    Dataset of book subjects that contain Statistical methods in bioinformatics...

    • workwithdata.com
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Dataset of book subjects that contain Statistical methods in bioinformatics : an introduction [Dataset]. https://www.workwithdata.com/datasets/book-subjects?f=1&fcol0=j0-book&fop0=%3D&fval0=Statistical+methods+in+bioinformatics+:+an+introduction&j=1&j0=books
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about book subjects. It has 1 row and is filtered where the books is Statistical methods in bioinformatics : an introduction. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.

  17. q

    Bioinformatics / Neuroinformatics

    • qubeshub.org
    Updated Oct 2, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Grisham (2019). Bioinformatics / Neuroinformatics [Dataset]. http://doi.org/10.25334/Q45B1Q
    Explore at:
    Dataset updated
    Oct 2, 2019
    Dataset provided by
    QUBES
    Authors
    William Grisham
    Description

    This module is a computer-based introduction to bioinformatics resources. This easy-to-adopt module weaves together several important bioinformatic tools so students can grasp how each is used in answering research questions. Published in CBE-LSE

  18. Introduction to bulk RNAseq analysis: supplementary material

    • zenodo.org
    zip
    Updated Nov 28, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Alejandro Romero Herrera; Jose Alejandro Romero Herrera (2023). Introduction to bulk RNAseq analysis: supplementary material [Dataset]. http://doi.org/10.5281/zenodo.10211512
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jose Alejandro Romero Herrera; Jose Alejandro Romero Herrera
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Vampirium setup testing

    This archive contains materials (datasets, exercises and slides, etc) used for the Introduction to bulk RNAseq analysis workshop taught at the University of Copenhagen by the Center for Health Data Science (HeaDS). The course repo can be found on Github:

    Assignments.zip contains exercises for the preprocessing part of the course, like fastqc and multiqc examples of bulk RNAseq experiments

    Data.zip contains count matrices (both traditional counts and salmon pseudocounts), as well as sample metadata (samplesheet.csv) and backup results from the preprocessing pipeline.

    Notes.zip contains supplementary materials such as extra pdfs for more information on bulk RNAseq technology.

    Slides and raw_reads will be released in a later version.

    Slides.zip contains all the slides used in the workshop.

    Raw_reads.zip contains the raw reads from the bulk RNAseq experiment (10.1016/j.celrep.2014.10.054) used in this course.

  19. s

    Data used in exercises in course Introduction to Data Management Practices

    • figshare.scilifelab.se
    • researchdata.se
    • +1more
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yvonne Kallberg; Elin Kronander; Niclas Jareborg; Markus Englund; Wolmar Nyberg Åkerström (2025). Data used in exercises in course Introduction to Data Management Practices [Dataset]. http://doi.org/10.17044/scilifelab.14301317.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala University
    Authors
    Yvonne Kallberg; Elin Kronander; Niclas Jareborg; Markus Englund; Wolmar Nyberg Åkerström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains the data files used in exercises in the NBIS course "Introduction to Data Management Practices".

  20. q

    Data from: Bioinformatics is a BLAST: Engaging First-Year Biology Students...

    • qubeshub.org
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shem Unger*; Mark Rollins (2022). Bioinformatics is a BLAST: Engaging First-Year Biology Students on Campus Biodiversity Using DNA Barcoding [Dataset]. https://qubeshub.org/community/groups/coursesource/publications?id=3520
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset provided by
    QUBES
    Authors
    Shem Unger*; Mark Rollins
    Description

    In order to introduce students to the concept of molecular diversity, we developed a short, engaging online lesson using basic bioinformatics techniques. Students were introduced to basic bioinformatics while learning about local on-campus species diversity by 1) identifying species based on a given sequence (performing Basic Local Alignment Search Tool [BLAST] analysis) and 2) researching and documenting the natural history of each species identified in a concise write-up. To assess the student’s perception of this lesson, we surveyed students using a Likert scale and asking them to elaborate in written reflection on this activity. When combined, student responses indicated that 94% of students agreed this lesson helped them understand DNA barcoding and how it is used to identify species. The majority of students, 89.5%, reported they enjoyed the lesson and mainly provided positive feedback, including “It really opened my eyes to different species on campus by looking at DNA sequences”, “I loved searching information and discovering all this new information from a DNA sequence”, and finally, “the database was fun to navigate and identifying species felt like a cool puzzle.” Our results indicate this lesson both engaged and informed students on the use of DNA barcoding as a tool to identify local species biodiversity.

    Primary Image: DNA Barcoded Specimens. Crane fly, dragonfly, ant, and spider identified using DNA barcoding.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aidan Budd (2016). Introductions to Bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.830401.v1
Organization logoOrganization logo

Introductions to Bioinformatics

Explore at:
11 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jan 18, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Aidan Budd
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A collection of similar but different presentations I've made aimed at introducing bioinformatics to bench biologists.

Search
Clear search
Close search
Google apps
Main menu