100+ datasets found
  1. Introductions to Bioinformatics

    • figshare.com
    pdf
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aidan Budd (2016). Introductions to Bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.830401.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Aidan Budd
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of similar but different presentations I've made aimed at introducing bioinformatics to bench biologists.

  2. f

    Data from: “Bioinformatics: Introduction and Methods,” a Bilingual Massive...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 11, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng, Yuqi; Wei, Liping; Gao, Ge; Yang, Xiaoxu; He, Yao; Ding, Yang; Liu, Fenglin; Ye, Adam Yongxin; Wang, Meng (2014). “Bioinformatics: Introduction and Methods,” a Bilingual Massive Open Online Course (MOOC) as a New Example for Global Bioinformatics Education [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001209841
    Explore at:
    Dataset updated
    Dec 11, 2014
    Authors
    Meng, Yuqi; Wei, Liping; Gao, Ge; Yang, Xiaoxu; He, Yao; Ding, Yang; Liu, Fenglin; Ye, Adam Yongxin; Wang, Meng
    Description

    “Bioinformatics: Introduction and Methods,” a Bilingual Massive Open Online Course (MOOC) as a New Example for Global Bioinformatics Education

  3. Bioinformatics Protein Dataset - Simulated

    • kaggle.com
    zip
    Updated Dec 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
    Explore at:
    zip(12928905 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Rafael Gallo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Subtitle

    "Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

    Description

    Introduction

    This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

    Columns Included

    • ID_Protein: Unique identifier for each protein.
    • Sequence: String of amino acids.
    • Molecular_Weight: Molecular weight calculated from the sequence.
    • Isoelectric_Point: Estimated isoelectric point based on the sequence composition.
    • Hydrophobicity: Average hydrophobicity calculated from the sequence.
    • Total_Charge: Sum of the charges of the amino acids in the sequence.
    • Polar_Proportion: Percentage of polar amino acids in the sequence.
    • Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.
    • Sequence_Length: Total number of amino acids in the sequence.
    • Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

    Inspiration and Sources

    While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

    Proposed Uses

    This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

    How This Dataset Was Created

    1. Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.
    2. Property Calculation: Physicochemical properties were calculated using the Biopython library.
    3. Class Assignment: Classes were randomly assigned for classification purposes.

    Limitations

    • The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.
    • The functional classes are simulated and do not correspond to actual biological characteristics.

    Data Split

    The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

    Acknowledgment

    This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.

  4. q

    Bioinformatics is a BLAST: Engaging First-Year Biology Students on Campus...

    • qubeshub.org
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shem Unger*; Mark Rollins (2022). Bioinformatics is a BLAST: Engaging First-Year Biology Students on Campus Biodiversity Using DNA Barcoding [Dataset]. https://qubeshub.org/community/groups/coursesource/publications?id=3520
    Explore at:
    Dataset updated
    Oct 4, 2022
    Dataset provided by
    QUBES
    Authors
    Shem Unger*; Mark Rollins
    Description

    In order to introduce students to the concept of molecular diversity, we developed a short, engaging online lesson using basic bioinformatics techniques. Students were introduced to basic bioinformatics while learning about local on-campus species diversity by 1) identifying species based on a given sequence (performing Basic Local Alignment Search Tool [BLAST] analysis) and 2) researching and documenting the natural history of each species identified in a concise write-up. To assess the student’s perception of this lesson, we surveyed students using a Likert scale and asking them to elaborate in written reflection on this activity. When combined, student responses indicated that 94% of students agreed this lesson helped them understand DNA barcoding and how it is used to identify species. The majority of students, 89.5%, reported they enjoyed the lesson and mainly provided positive feedback, including “It really opened my eyes to different species on campus by looking at DNA sequences”, “I loved searching information and discovering all this new information from a DNA sequence”, and finally, “the database was fun to navigate and identifying species felt like a cool puzzle.” Our results indicate this lesson both engaged and informed students on the use of DNA barcoding as a tool to identify local species biodiversity.

    Primary Image: DNA Barcoded Specimens. Crane fly, dragonfly, ant, and spider identified using DNA barcoding.

  5. Dataset for practice session 1 in bioinformatics

    • figshare.com
    txt
    Updated Jul 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Sugis (2016). Dataset for practice session 1 in bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.3490211.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 17, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Elena Sugis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset for the practice in the data preprocessing and unsupervised learning in the introduction to bioinformatics course

  6. Introduction to Biodiversity Informatics

    • figshare.com
    pptx
    Updated Feb 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Koureas (2016). Introduction to Biodiversity Informatics [Dataset]. http://doi.org/10.6084/m9.figshare.1295382.v3
    Explore at:
    pptxAvailable download formats
    Dataset updated
    Feb 5, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Dimitrios Koureas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A brief introduction to the concept, vision and challenges associated with Biodiversity Informatics.

  7. Data_Sheet_2_Bioinformatics-Based Activities in High School: Fostering...

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ana Martins; Maria João Fonseca; Marina Lemos; Leonor Lencastre; Fernando Tavares (2023). Data_Sheet_2_Bioinformatics-Based Activities in High School: Fostering Students’ Literacy, Interest, and Attitudes on Gene Regulation, Genomics, and Evolution.pdf [Dataset]. http://doi.org/10.3389/fmicb.2020.578099.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Ana Martins; Maria João Fonseca; Marina Lemos; Leonor Lencastre; Fernando Tavares
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The key role of bioinformatics in explaining biological phenomena calls for the need to rethink didactic approaches at high school aligned with a new scientific reality. Despite several initiatives to introduce bioinformatics in the classroom, there is still a lack of knowledge on their impact on students’ learning gains, engagement, and motivation. In this study, we detail the effects of four bioinformatics laboratories tailored for high school biology classes named “Mining the Genome: Using Bioinformatics Tools in the Classroom to Support Student Discovery of Genes” on literacy, interest, and attitudes on 387 high school students. By exploring these laboratories, students get acquainted with bioinformatics and acknowledge that many bioinformatics tools can be intuitive for beginners. Furthermore, introducing comparative genomics in their learning practices contributed for a better understanding of curricular contents regarding the identification of genes, their regulation, and how to make evolutionary assumptions. Following the intervention, students were able to pinpoint bioinformatics tools required to identify genes in a genomics sequence, and most importantly, they were able to solve genomics-related misconceptions. Overall, students revealed a positive attitude regarding the integration of bioinformatics-based approaches in their learning practices, reinforcing their added value in educational approaches.

  8. f

    Comparison of the multiple-delivery-mode training model employed by...

    • datasetcatalog.nlm.nih.gov
    Updated Feb 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lennard, Katie; Aron, Shaun; Panji, Sumir; Kennedy, Dane; Mulder, Nicola; Allali, Imane; Fields, Christopher J; Ras, Verena; Mwaikono, Kilaza Samson; Rendon, Gloria; Claassen-Weitz, Shantelle; Holmes, Jessica R.; Botha, Gerrit (2021). Comparison of the multiple-delivery-mode training model employed by H3ABioNet’s Introduction to Bioinformatics (IBT) course and the 16s rRNA Microbiome Intermediate Bioinformatics Training course (16S). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000897705
    Explore at:
    Dataset updated
    Feb 25, 2021
    Authors
    Lennard, Katie; Aron, Shaun; Panji, Sumir; Kennedy, Dane; Mulder, Nicola; Allali, Imane; Fields, Christopher J; Ras, Verena; Mwaikono, Kilaza Samson; Rendon, Gloria; Claassen-Weitz, Shantelle; Holmes, Jessica R.; Botha, Gerrit
    Description

    The table provides a short description of the major components of the model employed by each course, highlighting any differences between the two (deviations are indicated by an asterisk (*)).

  9. f

    Data from: Bioinformatics calls the school: Use of smartphones to introduce...

    • datasetcatalog.nlm.nih.gov
    Updated Feb 14, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rueda, Ana Julia Velez; Benítez, Guillermo I.; Parisi, Gustavo; Fornasari, María Silvina; Hasenahuer, Marcia Anahí; Marchetti, Julia; Palopoli, Nicolas (2019). Bioinformatics calls the school: Use of smartphones to introduce Python for bioinformatics in high schools [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000159463
    Explore at:
    Dataset updated
    Feb 14, 2019
    Authors
    Rueda, Ana Julia Velez; Benítez, Guillermo I.; Parisi, Gustavo; Fornasari, María Silvina; Hasenahuer, Marcia Anahí; Marchetti, Julia; Palopoli, Nicolas
    Description

    The dynamic nature of technological developments invites us to rethink the learning spaces. In this context, science education can be enriched by the contribution of new computational resources, making the educational process more up-to-date, challenging, and attractive. Bioinformatics is a key interdisciplinary field, contributing to the understanding of biological processes that is often underrated in secondary schools. As a useful resource in learning activities, bioinformatics could help in engaging students to integrate multiple fields of knowledge (logical-mathematical, biological, computational, etc.) and generate an enriched and long-lasting learning environment. Here, we report our recent project in which high school students learned basic concepts of programming applied to solving biological problems. The students were taught the Python syntax, and they coded simple tools to answer biological questions using resources at hand. Notably, these were built mostly on the students’ own smartphones, which proved to be capable, readily available, and relevant complementary tools for teaching. This project resulted in an empowering and inclusive experience that challenged differences in social background and technological accessibility.

  10. z

    Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Sep 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thiseas C. Lamnidis; Thiseas C. Lamnidis; Aida Andrades Valtueña; Aida Andrades Valtueña; James A. Fellows Yates; James A. Fellows Yates (2024). Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction to the Command Line [Dataset]. http://doi.org/10.5281/zenodo.13759270
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    SPAAM Community
    Authors
    Thiseas C. Lamnidis; Thiseas C. Lamnidis; Aida Andrades Valtueña; Aida Andrades Valtueña; James A. Fellows Yates; James A. Fellows Yates
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and conda software environment file for the chapter 'Introduction to the Command Line' of the SPAAM Community's textbook: Introduction to Ancient Metagenomics (https://www.spaam-community.org/intro-to-ancient-metagenomics-book).

  11. q

    Sequence Similarity: An inquiry based and "under the hood" approach for...

    • qubeshub.org
    Updated Aug 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam Kleinschmit*; Benita Brink; Steven Roof; Carlos Goller; Sabrina Robertson (2021). Sequence Similarity: An inquiry based and "under the hood" approach for incorporating molecular sequence alignment in introductory undergraduate biology courses [Dataset]. http://doi.org/10.24918/cs.2019.5
    Explore at:
    Dataset updated
    Aug 28, 2021
    Dataset provided by
    QUBES
    Authors
    Adam Kleinschmit*; Benita Brink; Steven Roof; Carlos Goller; Sabrina Robertson
    Description

    Introductory bioinformatics exercises often walk students through the use of computational tools, but often provide little understanding of what a computational tool does "under the hood." A solid understanding of how a bioinformatics computational algorithm functions, including its limitations, is key for interpreting the output in a biologically relevant context. This introductory bioinformatics exercise integrates an introduction to web-based sequence alignment algorithms with models to facilitate student reflection and appreciation for how computational tools provide similarity output data. The exercise concludes with a set of inquiry-based questions in which students may apply computational tools to solve a real biological problem.

    In the module, students first define sequence similarity and then investigate how similarity can be quantitatively compared between two similar length proteins using a Blocks Substitution Matrix (BLOSUM) scoring matrix. Students then look for local regions of similarity between a sequence query and subjects within a large database using Basic Local Alignment Search Tool (BLAST). Lastly, students access text-based FASTA-formatted sequence information via National Center for Biotechnology Information (NCBI) databases as they collect sequences for a multiple sequence alignment using Clustal Omega to generate a phylogram and evaluate evolutionary relationships. The combination of diverse, inquiry-based questions, paper models, and web-based computational resources provides students with a solid basis for more advanced bioinformatics topics and an appreciation for the importance of bioinformatics tools across the discipline of biology.

  12. o

    Introduction to single cell RNAseq analysis: supplementary material

    • explore.openaire.eu
    Updated Apr 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jose Alejandro Romero Herrera; Samuele Soraggi (2023). Introduction to single cell RNAseq analysis: supplementary material [Dataset]. http://doi.org/10.5281/zenodo.7920686
    Explore at:
    Dataset updated
    Apr 14, 2023
    Authors
    Jose Alejandro Romero Herrera; Samuele Soraggi
    Description

    This archive contains supplementary material used in the workshop "Introduction to single cell RNAseq analysis" taught by the Danish National Sandbox for Health Data Science. The course repo can be found on Github. Data.zip contains 6 10x runs on Spermatogonia development. 3 from healthy individuals and 3 from azoospermic individuals. Data has been already preprocessed using cellranger and can be loaded using Seurat (R) or scanpy (python). Slides.zip contains slides explaning theory regarding single cell RNAseq data analysis Notebooks.zip contains Rmarkdown files to follow the course in using R in Rstudio. Updated version of the notebooks.

  13. s

    Data used in exercises in course Introduction to Data Management Practices

    • figshare.scilifelab.se
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yvonne Kallberg; Elin Kronander; Niclas Jareborg; Markus Englund; Wolmar Nyberg Åkerström (2025). Data used in exercises in course Introduction to Data Management Practices [Dataset]. http://doi.org/10.17044/scilifelab.14301317.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala University
    Authors
    Yvonne Kallberg; Elin Kronander; Niclas Jareborg; Markus Englund; Wolmar Nyberg Åkerström
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This record contains the data files used in exercises in the NBIS course "Introduction to Data Management Practices".

  14. z

    Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction...

    • zenodo.org
    application/gzip
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Nota; Kevin Nota; Robin Warner; Maxime Borry; Maxime Borry; Robin Warner (2024). Introduction to Ancient Metagenomics Textbook (Edition 2024): Introduction to Python and Pandas [Dataset]. http://doi.org/10.5281/zenodo.11394586
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    SPAAM Community
    Authors
    Kevin Nota; Kevin Nota; Robin Warner; Maxime Borry; Maxime Borry; Robin Warner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and conda software environment file for the chapter 'Introduction to Python and Pandas' of the SPAAM Community's textbook: Introduction to Ancient Metagenomics (https://www.spaam-community.org/intro-to-ancient-metagenomics-book).

  15. M

    Bioinformatics Services Market to Hit US$ 10.7 Billion in Next Decade

    • media.market.us
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Media (2024). Bioinformatics Services Market to Hit US$ 10.7 Billion in Next Decade [Dataset]. https://media.market.us/bioinformatics-services-market-news/
    Explore at:
    Dataset updated
    Nov 5, 2024
    Dataset authored and provided by
    Market.us Media
    License

    https://media.market.us/privacy-policyhttps://media.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    United States
    Description

    Introduction

    The Global Bioinformatics Services Market is poised for substantial growth, projected to increase from USD 2.9 billion in 2023 to USD 10.7 billion by 2033, achieving a compound annual growth rate (CAGR) of 13.9%. This market expansion is fueled by several key factors including technological advancements in genomics and the increasing complexity of biological datasets, which necessitate advanced computational technologies for efficient data management, analysis, and interpretation. These technologies are crucial for advancing medical research and improving patient care, particularly through personalized treatment plans and precision medicine.

    Institutions like the Mayo Clinic are significantly contributing to this growth by expanding their bioinformatics services to support translational research and enhance patient care through the integration of large multi-omics data sets. Additionally, prominent educational institutions such as Stanford and Georgetown University are advancing their bioinformatics programs to equip the next generation of professionals with the necessary skills to address complex biomedical challenges using computational and quantitative methods.

    The sector is also witnessing a surge in demand within the healthcare and pharmaceutical industries, where bioinformatics tools are integral to drug discovery and disease diagnosis. This demand drives the development of therapeutic strategies and deepens the understanding of disease mechanisms, further boosting the market growth. Research initiatives and collaborations, such as those at Harvard Medical School’s Department of Biomedical Informatics and Stanford's Biomedical Informatics Research division, are key in transforming biomedical data into actionable insights for precision medicine.

    In terms of recent industry developments, in January 2024, Qiagen announced a significant expansion of investments into its Qiagen Digital Insights (QDI) business. This expansion, fueled by robust sales of approximately $100 million in 2023, is set to enhance QDI's bioinformatics capabilities, including launching at least five new products and broadening the applications of Artificial Intelligence and Natural Language Processing within the sector.

    Furthermore, in January 2023, Agilent Technologies unveiled a major investment of $725 million to double its manufacturing capacity for nucleic acid-based therapeutics, in response to the rapid growth in the therapeutic oligonucleotides market, projected to reach $2.4 billion by 2027. This expansion will introduce two new manufacturing lines to meet the escalating demand for siRNA, antisense, and CRISPR guide RNA molecules, reinforcing Agilent's market presence and capacity in this fast-evolving field.

  16. o

    WORKSHOP: Introduction to Machine Learning in R - from data to knowledge

    • explore.openaire.eu
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fotis Psomopoulos; Eden Zhang; Erin Graham; Giorgia Mori; Uwe Winter (2024). WORKSHOP: Introduction to Machine Learning in R - from data to knowledge [Dataset]. http://doi.org/10.5281/zenodo.14545611
    Explore at:
    Dataset updated
    Dec 9, 2024
    Authors
    Fotis Psomopoulos; Eden Zhang; Erin Graham; Giorgia Mori; Uwe Winter
    Description

    This record includes training materials associated with the Australian BioCommons workshop ‘Introduction to Machine Learning in R - from data to knowledge’. This workshop took place over one, 4 hour sessions on 09 December 2024. Event description With the rise in high-throughput sequencing technologies, the volume of omics data has grown exponentially. A major issue is to mine useful knowledge from these heterogeneous collections of data. The analysis of complex high-volume data is not trivial and classical tools cannot be used to explore their full potential. Machine Learning (ML), a discipline in which computers perform automated learning without being programmed explicitly and assist humans to make sense of large and complex data sets, can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of bioinformatics. This hands-on workshop will introduce participants to the ML taxonomy and the applications of common ML algorithms to health data. The workshop will cover the foundational concepts and common methods being used to analyse omics data sets by providing a practical context through the use of basic but widely used R libraries. Participants will acquire an understanding of the standard ML processes, as well as the practical skills in applying them on familiar problems and publicly available real-world data sets. Materials are shared under a Creative Commons Attribution 4.0 International agreement unless otherwise specified and were current at the time of the event. Lead trainers: Dr Fotis Psomopoulos, Senior Researcher, Institute of Applied Biosciences (INAB), Center for Research and Technology Hellas (CERTH) Facilitators: Dr Giorgia Mori, Australian BioCommons Dr Eden Zhang, Sydney Informatics Hub Dr Erin Graham, Queensland Cyber Infrastructure Foundation (QCIF) Infrastructure provision: Uwe Winter, Australian BioCommons Host: Dr. Giorgia Mori, Australian BioCommons Training materials Files and materials included in this record: Event metadata (PDF): Information about the event including, description, event URL, learning objectives, prerequisites, technical requirements etc. Training materials webpage Data and documentation

  17. Transcriptomics in yeast

    • kaggle.com
    zip
    Updated Jan 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CostalAether (2017). Transcriptomics in yeast [Dataset]. https://www.kaggle.com/costalaether/yeast-transcriptomics
    Explore at:
    zip(4901525 bytes)Available download formats
    Dataset updated
    Jan 24, 2017
    Authors
    CostalAether
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Disclaimer

    This is a data set of mine that I though might be enjoyable to the community. It's concerning Next generation sequencing and Transcriptomics. I used several raw datasets, that are public, but the processing to get to this dataset is extensive. This is my first contribution to kaggle, so be nice, and let me know how I can improve the experience. NGS machines are combined the biggest data producer worldwide. So why not add some (more? ) to kaggle.

    A look into Yeast transcriptomics

    Background

    Yeasts ( in this case saccharomyces cerevisiae) are used in the production of beer, wine, bread and a whole lot of Biotech applications such as creating complex pharmaceuticals. They are living eukaryotic organisms (meaning quite complex). All living organisms store information in their DNA, but action within a cell is carried out by specific Proteins. The path from DNA to Protein (from data to action) is simple. a specific region on the DNA gets transcribed to mRNA, that gets translated to proteins. Common assumption says that the translation step is linear, more mRNA means more protein. Cells actively regulate the amount of protein by the amount of mRNA it creates. The expression of each gene depends on the condition the cell is in (starving, stressed etc..) Modern methods in Biology show us all mRNA that is currently inside a cell. Assuming the linearity of the process, we can get more protein the more specific mRNA is available to a cell. Making mRNA an excellent marker for what is actually happening inside a cell. It is important to consider that mRNA is fragile. It is actively replenished only when it is needed. Both mRNA and proteins are expensive for a cell to produce .

    Yeasts are good model organisms for this, since they only have about 6000 genes. They are also single cells which is more homogeneous, and contain few advanced features (splice junctions etc.)

    ( all of this is heavily simplified, let me know if I should go into more details )

    The data

    files

    The following files are provided **SC_expression.csv** expression values for each gene over the available conditions **labels_CC.csv ** labels for the individual genes , their status and where known intracellular localization ( see below) Maybe this would be nice as a little competition, I'll see how this one is going before I'll upload the other label files. Please provide some feedback on the presentation, and whatever else you would want me to share.

    background

    I used 92 samples from various openly available raw datasets, and ran them through a modern RNAseq pipeline. Spanning a range of different conditions (I hid the raw names). The conditions covered stress conditions, temperature and heavy metals, as well as growth media changes and the deletion of specific genes. Originally I had 150 sets, 92 are of good enough quality. Evaluation was done on gene level. Each gene got it's own row, Samples are columns (some are in replicates over several columns) . Expression levels were normalized with by TPM (transcripts per million), a default normalization procedure. Raw counts would have been integers, normalized they are floats.

    Analysis and labels

    Genes

    The function of individual genes is a matter of dispute. Clearly living cells are complex. The inner machinations of cells are not visible. Gene functionality is commonly inferred indirectly by removing a gene, and test the cells behavior. This is time consuming and not very precise. As you can see in the dataset, there is still much to be done to fully understand even single cell yeasts.

    The provided dataset is allows for a different approach to functional classification of genes. The label files contained in the set correspond a gene to a specific label. The classification is based on the official Gene Onthology associations classification. I simplified the nomenclature. Gene functionality is usually given in a hierarchical structure. [inside cell --> cytoplasma --> associated to complex A ... ] I'm only keeping high level associations, and using readable terms instead of GO terms. I'll extend if people are interested.

    Labels

    CC labels concern Cellular Component.
    Where the gene is within a cell. goes into details of found associations. the term 'cellular_component' should be seen as E.g the label 'cellular_component' is synonymous with 'unknown location' . CC is the easiest label to attach to a gene. It is the one that can be studied the easiest. Still there are many genes missing.

    MF labels concern Molecular Function. What is the gene doing. [upcoming] BP labels concern Biological Processes. What is the genes involvement. [upcoming]

    The core interest here is whether it is possible to improve the genes classification by modeling the data. A common assu...

  18. f

    Table1_Bioinformatics on the Road: Taking Training to Students and...

    • frontiersin.figshare.com
    docx
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos (2023). Table1_Bioinformatics on the Road: Taking Training to Students and Researchers Beyond State Capitals.DOCX [Dataset]. http://doi.org/10.3389/feduc.2021.726930.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    Frontiers
    Authors
    Marcus Braga; Fabrício Araujo; Edian Franco; Kenny Pinheiro; Jakelyne Silva; Denner Maués; Sebastiao Neto; Lucas Pompeu; Luis Guimaraes; Adriana Carneiro; Igor Hamoy; Rommel Ramos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In Brazil, training capable bioinformaticians is done, mostly, in graduate programs, sometimes with experiences during the undergraduate period. However, this formation tends to be inefficient in attracting students to the area and mainly in attracting professionals to support research projects in research groups. To solve these issues, participation in short courses is important for training students and professionals in the usage of tools for specific areas that use bioinformatics, as well as in ways to develop solutions tailored to the local needs of academic institutions or research groups. In this aim, the project “Bioinformática na Estrada” (Bioinformatics on the Road) proposed improving bioinformaticians’ skills in undergraduate and graduate courses, primarily in the countryside of the State of Pará, in the Amazon region of Brazil. The project scope is practical courses focused on the areas of interest of the place where the courses are occurring to train and encourage students and researchers to work in this field, reducing the existing gap due to the lack of qualified bioinformatics professionals. Theoretical and practical workshops took place, such as Introduction to Bioinformatics, Computer Science Basics, Applications of Computational Intelligence applied to Bioinformatics and Biotechnology, Computational Tools for Bioinformatics, Soil Genomics and Research Perspectives and Horizons in the Amazon Region. In the end, 444 undergraduate and graduate students from higher education institutions in the state of Pará and other Brazilian states attended the events of the Bioinformatics on the Road project.

  19. q

    A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality...

    • qubeshub.org
    Updated Aug 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael †; William †; Sabrina Robertson; Andrew Lonsdale; Caylin Murray; Jason Williams; Ray Enke (2021). A Fun Introductory Command Line Lesson: Next Generation Sequencing Quality Analysis with Emoji! [Dataset]. http://doi.org/10.24918/cs.2021.17
    Explore at:
    Dataset updated
    Aug 30, 2021
    Dataset provided by
    QUBES
    Authors
    Rachael †; William †; Sabrina Robertson; Andrew Lonsdale; Caylin Murray; Jason Williams; Ray Enke
    Description

    Radical innovations in DNA sequencing technology over the past decade have created an increased need for computational bioinformatics analyses in the 21st century STEM workforce. Recent evidence however demonstrates that there are significant barriers to teaching these skills at the undergraduate level including lack of faculty training, lack of student interest in bioinformatics, lack of vetted teaching materials, and overly full curricula. To this end, the James Madison University, Center for Genome & Metagenome Studies (JMU CGEMS) and other PUI collaborators are devoted to developing and disseminating engaging bioinformatics teaching materials specifically designed for streamlined integration into general undergraduate biology curriculum. Here, we have developed and integrated a fun introductory level lesson to command line next generation sequencing (NGS) analysis into a large enrollment core biology course. This one-off activity takes a crucial but mundane aspect of NGS quality control (QC) analysis and incorporates the use of Emoji data outputs using the software FASTQE to pique student interest. This amusing command line analysis is subsequently paired with a more rigorous research-grade software package called FASTP in which students complete sequence QC and filtering using a few simple commands. Collectively, this short lesson provides novice-level faculty and students an engaging entry point to learning basic genomics command line programming skills as a gateway to more complex and elaborated applications of computational bioinformatics analyses.

    Primary image: Undergraduate students learn the basics of command line NGS quality analysis using the FASTQE and FASTP programs.

  20. q

    Making toast: Using analogies to explore concepts in bioinformatics

    • qubeshub.org
    Updated Aug 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate Hertweck (2021). Making toast: Using analogies to explore concepts in bioinformatics [Dataset]. http://doi.org/10.24918/cs.2016.11
    Explore at:
    Dataset updated
    Aug 26, 2021
    Dataset provided by
    QUBES
    Authors
    Kate Hertweck
    Description

    Contemporary biology is moving towards heavy reliance on computational methods to manage, find patterns, and derive meaning from large-scale data, such as genomic sequences. Biology teachers are increasingly compelled to prepare students with skills to meet these challenges. However, introducing biology students to more abstract concepts associated with computational thinking remains a major challenge. Analogies have long been used in science classrooms to help students comprehend complex concepts by relating them to familiar processes. Here I present a multi-step procedure for introducing students to large-scale data analysis (bioinformatics workflows) by asking them to describe a common daily task: making toast. First, students describe the main steps associated with this procedure. Next, students are presented with alternative scenarios for materials and equipment and are asked to extend the analogy to accommodate them. Finally, students are led through examples of how the analogy breaks down, or fails to accurately represent, a bioinformatics analysis. This structured approach to student exploration of analogies related to computational biology capitalizes on diverse student experiences to both clarify concepts and ameliorate possible misconceptions. Similar methods can be used to introduce many abstract concepts in both biology and computer science.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aidan Budd (2016). Introductions to Bioinformatics [Dataset]. http://doi.org/10.6084/m9.figshare.830401.v1
Organization logoOrganization logo

Introductions to Bioinformatics

Explore at:
9 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jan 18, 2016
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Aidan Budd
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A collection of similar but different presentations I've made aimed at introducing bioinformatics to bench biologists.

Search
Clear search
Close search
Google apps
Main menu