100+ datasets found

f
Datasets
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12958037.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Bastian Eichenberger; YinXiu Zhan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.
Data from: Development of the InTelligence And Machine LEarning (TAME)...
catalog.data.gov
Updated Oct 31, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research [Dataset]. https://catalog.data.gov/dataset/development-of-the-intelligence-and-machine-learning-tame-toolkit-for-introductory-data-sc
Explore at:
Dataset updated
Oct 31, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The original contributions presented in the study are included in the article and online through the TAME Toolkit, available at: https://uncsrp.github.io/Data-Analysis-Training-Modules/, with underlying code and datasets available in the parent UNC-SRP GitHub website (https://github.com/UNCSRP). This dataset is associated with the following publication: Roell, K., L. Koval, R. Boyles, G. Patlewicz, C. Ring, C. Rider, C. Ward-Caviness, D. Reif, I. Jaspers, R. Fry, and J. Rager. Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research. Frontiers in Toxicology. Frontiers, Lausanne, SWITZERLAND, 4: 893924, (2022).
BioSR: a biological image dataset for super-resolution microscopy
figshare.com
zip
Updated Jan 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Qiao; Di Li (2024). BioSR: a biological image dataset for super-resolution microscopy [Dataset]. http://doi.org/10.6084/m9.figshare.13264793.v9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13264793.v9
Dataset updated
Jan 2, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Chang Qiao; Di Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BioSR is a biological image dataset for super-resolution microscopy, currently including more than 2200 pairs of low-and-high resolution images covering four biology structures (CCPs, ER, MTs, F-actin), nine signal levels (15-600 average photon count), and two upscaling-factors (linear SIM and non-linear SIM). BioSR is now freely available, aiming to provide a high-quality dataset for the community of single bio-image super-resolution algorithm and advanced SIM reconstruction algorithm developers. For more information about BioSR, please see our Nature Methods manuscript, "Evaluation and development of deep neural networks for image super-resolution in optical microscopy" (DOI: 10.1038/s41592-020-01048-5). Update 2022.10.04 Add DataSet of rDL-SRM (Zenodo Link).xlsx file, which includes descriptions and Zenodo links of BioSR+ (data extension of BioSR) and other data used in our Nature Biotechnology paper "Rationalized deep learning super-resolution microscopy for sustained live imaging of rapid subcellular processes" (DOI: 10.1038/s41587-022-01471-3 ).
Data from: Benchmarking imputation methods for categorical biological data
zenodo.org
data.niaid.nih.gov
zip
Updated Mar 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre (2024). Benchmarking imputation methods for categorical biological data [Dataset]. http://doi.org/10.5281/zenodo.10800016
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10800016
Dataset updated
Mar 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthieu Gendre; Torsten Hauffe; Torsten Hauffe; Catalina Pimiento; Catalina Pimiento; Daniele Silvestro; Daniele Silvestro; Matthieu Gendre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 9, 2024
Description
Description:

Welcome to the Zenodo repository for Publication Benchmarking imputation methods for categorical biological data, a comprehensive collection of datasets and scripts utilized in our research endeavors. This repository serves as a vital resource for researchers interested in exploring the empirical and simulated analyses conducted in our study.

Contents:

empirical_analysis:

Trait Dataset of Elasmobranchs: A collection of trait data for elasmobranch species obtained from FishBase , stored as RDS file.

Phylogenetic Tree: A phylogenetic tree stored as a TRE file.

Imputations Replicates (Imputation): Replicated imputations of missing data in the trait dataset, stored as RData files.

Error Calculation (Results): Error calculation results derived from imputed datasets, stored as RData files.

Scripts: Collection of R scripts used for the implementation of empirical analysis.

simulation_analysis:

Input Files: Input files utilized for simulation analyses as CSV files

Data Distribution PDFs: PDF files displaying the distribution of simulated data and the missingness.

Output Files: Simulated trait datasets, trait datasets with missing data, and trait imputed datasets with imputation errors calculated as RData files.

Scripts: Collection of R scripts used for the simulation analysis.

TDIP_package:

Scripts of the TDIP Package: All scripts related to the Trait Data Imputation with Phylogeny (TDIP) R package used in the analyses.

Purpose:

This repository aims to provide transparency and reproducibility to our research findings by making the datasets and scripts publicly accessible. Researchers interested in understanding our methodologies, replicating our analyses, or building upon our work can utilize this repository as a valuable reference.

Citation:

When using the datasets or scripts from this repository, we kindly request citing Publication Benchmarking imputation methods for categorical biological data and acknowledging the use of this Zenodo repository.

Thank you for your interest in our research, and we hope this repository serves as a valuable resource in your scholarly pursuits.
D
Digital Biology Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Digital Biology Report [Dataset]. https://www.datainsightsmarket.com/reports/digital-biology-1501898
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jul 13, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The digital biology market is experiencing robust growth, driven by the convergence of advanced computing, data analytics, and life sciences. The increasing availability of large biological datasets, coupled with advancements in artificial intelligence (AI) and machine learning (ML), is fueling the development of innovative tools and platforms for drug discovery, personalized medicine, and agricultural biotechnology. This market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033, reaching approximately $60 billion by 2033. Key drivers include the rising demand for faster and more efficient drug development processes, the increasing prevalence of chronic diseases necessitating personalized treatments, and the growing adoption of precision agriculture techniques. The market's segmentation encompasses software solutions, hardware infrastructure, and services, with leading players like DUNA Bioinformatics, Precigen, Dassault Systèmes, Genedata AG, and Simulations Plus actively shaping the market landscape through continuous innovation. The North American region currently holds a significant market share due to substantial investments in R&D and the presence of major players, although growth in other regions like Europe and Asia-Pacific is accelerating. While the market's growth trajectory is positive, certain restraints exist. High upfront investment costs for software and hardware, the need for skilled personnel to operate advanced systems, and data security and privacy concerns are some challenges that the industry needs to address. However, ongoing technological advancements are mitigating these limitations. The development of user-friendly interfaces, cloud-based solutions, and improved data security measures are steadily increasing market accessibility and fostering wider adoption. Further fueling market expansion are collaborative initiatives between academic institutions, pharmaceutical companies, and technology providers, fostering the creation of innovative and cost-effective solutions. This collaborative approach is crucial for overcoming the challenges and unlocking the immense potential of digital biology in transforming various sectors.
f
Table1_Interpretable machine learning methods for predictions in systems...
frontiersin.figshare.com
pdf
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Sidak; Jana Schwarzerová; Wolfram Weckwerth; Steffen Waldherr (2023). Table1_Interpretable machine learning methods for predictions in systems biology from omics data.pdf [Dataset]. http://doi.org/10.3389/fmolb.2022.926623.s002
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2022.926623.s002
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
David Sidak; Jana Schwarzerová; Wolfram Weckwerth; Steffen Waldherr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
h
Supporting data for the thesis entitled "Interpretable deep learning methods...
datahub.hku.hk
zip
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weizhong Zheng (2025). Supporting data for the thesis entitled "Interpretable deep learning methods for biological sequencing data". [Dataset]. http://doi.org/10.25442/hku.24042573.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25442/hku.24042573.v1
Dataset updated
Aug 29, 2025
Dataset provided by
HKU Data Repository
Authors
Weizhong Zheng
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This datahub includes sequencing data that were used to develop the new methods for Chapter 2, Chapter 3 and Chapter 4 of the PhD thesis "Interpretable deep learning methods for biological sequencing data"Under the main folder Dataset, there are three sub-folders. They contain the intermediate results and the code to reproduce the figures for each chapter. Each of the folders is named after Dataset(Chapter X) and has the structure of:Raw : The datasets we used to develop the methods are all publicly available. Under the raw folder, you will find a txt file that directs you to the download link for all the raw data.Processed :This folder contains the intermediate results that were built on the public dataset. For example, the analysis results using our proposed methods. In addition, we also uploaded some source data that can be used to draw the figures.Script : The Script folder has two kinds of code:The source code of the Python package we developed The package can be installed directly.The jupyter notebooks demonstrate how to use our packages and the process of generating figures.
o
On the nature of mixed-type features in materials datasets
explore.openaire.eu
zenodo.org
Updated Jan 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duy-Tai Dinh; Duong-Nguyen Nguyen; Hieu-Chi Dam (2021). On the nature of mixed-type features in materials datasets [Dataset]. http://doi.org/10.5281/zenodo.4474847
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4474847
Dataset updated
Jan 28, 2021
Authors
Duy-Tai Dinh; Duong-Nguyen Nguyen; Hieu-Chi Dam
Description
We provide four crystalline materials datasets that contain both numerical and categorical features of materials. The Lattice dataset [1,2] contains 1,439 binary AB body-centered cubic crystals, as described by 12 numerical and 9 categorical variables. The lattice constant is considered as the physical property of interest. The TC dataset [2,3] contains 101 binary alloys of transition and rare earth metals, as described by 15 numerical and 17 categorical features. The Curie temperature (TC) is considered as the property of interest. The Octet binary materials dataset [4] contains 82 materials, as described by 11 numerical and 2 categorical features. The difference in LDA energy between RS and ZB (∆E = E(RS) − E(ZB)) in eV is considered as the physical property of interest. The Fm3m dataset [5] contains 239 binary compounds collected from the Materials Project, as described by 12 numerical and 17 categorical variables. The formation of energy is considered as the physical property of interest. References: [1] K. Takahashi, L. Takahashi, J. D. Baran, and Y. Tanaka, "Descriptors for predicting the lattice constant of body centered cubic crystal", The Journal of chemical physics 146, 204104 (2017). [2] D.-N. Nguyen, T.-L. Pham, V.-C. Nguyen, T.-D. Ho, T. Tran, K. Takahashi, and H.-C. Dam, "Committee machine that votes for similarity between materials", IUCrJ 5, 830-840 (2018). [3] Y. Xu, M. Yamazaki, and P. Villars, "Inorganic materials database for exploring the nature of material", Japanese Journal of Applied Physics 50, 11RH02 (2011). [4] L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheer, "Big data of materials science: critical role of the descriptor", Physical review letters 114, 105503 (2015). [5] A. Jain, S. P. Ong, G. Hautier, W. Chen, W. D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, et al., "Commentary: The materials project: A materials genome approach to accelerating materials innovation", Apl Materials 1, 011002 (2013).
Data from: A machine learning framework for extracting information from...
zenodo.org
Updated Apr 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mun Su Kwon; Junkyu Lee; Hyun Uk Kim; Hyun Uk Kim; Mun Su Kwon; Junkyu Lee (2024). A machine learning framework for extracting information from biological pathway images in the literature [Dataset]. http://doi.org/10.5281/zenodo.11075692
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11075692
Dataset updated
Apr 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mun Su Kwon; Junkyu Lee; Hyun Uk Kim; Hyun Uk Kim; Mun Su Kwon; Junkyu Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and validation datasets_arrow detection.zip:
Training and validation datasets for arrow detection using Faster R-CNN model. A total of 6,471 images have been prepared, including 2,332 images from five different sources and 4,139 augmented images.

Test dataset_arrow detection.zip:
Test dataset for arrow detection using Faster R-CNN model. A total of 100 images have been prepared from 89 papers searched through PubMed Central (PMC).

EBPI outputs.txt:
Reaction information extracted using EBPI from 49,846 biological pathway images across 466 target chemicals.
R
Biological Gender Detection Dataset
universe.roboflow.com
zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
machine learning (2025). Biological Gender Detection Dataset [Dataset]. https://universe.roboflow.com/machine-learning-hgkth/biological-gender-detection
Explore at:
zipAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
machine learning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
1
Description
Biological Gender Detection

## Overview Biological Gender Detection is a dataset for classification tasks - it contains 1 annotations for 318 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
C
Computational Biology Industry Report
datainsightsmarket.com
doc, pdf, ppt
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2024). Computational Biology Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/computational-biology-industry-9558
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Nov 26, 2024
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The size of the Computational Biology Industry market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 13.33% during the forecast period. The computational biology industry is booming, driven by the growth in volumes of biological data generated by advancing genomics, proteomics, and systems biology. It involves an interdisciplinary approach that links biology, computer science, and mathematics to analyze complicated biological systems and processes-deemed indispensable for drug discovery, personalized medicine, and agricultural biotechnology. The rising incidence of chronic diseases necessitates targeted therapies and precise diagnostics, thereby becoming a key driver for market growth. The tools of computational biology, which include bioinformatics software, machine learning algorithms, and modeling simulations, enable the extraction of meaningful insights from vast datasets, accelerating the pace of scientific discovery. Technological advancements are further enhancing the functionality of computational biology. The way biological data is interpreted in terms of analysis is undergoing a fundamental shift with AI and machine learning being increasingly integrated in data analysis. Moreover, cloud computing makes it easy for researchers to share data as well as collaborate, making innovation in this field flourish. Geographical center, North America, strong existence of research institutions, biotechnology firms, and investments by funding in life sciences research. Asia-Pacific is emerging, with increased investments in the healthcare and biotechnology sectors and growing importance of personalized medicine. Essentially, the overall industry of computational biology would seem to have excellent chances for sustained expansion based on the further advancing nature of technology, be it a need to gain a clearer sense of incredible data sizes or the overall emphasis to expand focus around precision health solutions. Biological science continually advancing, through computation will unlock new sights, it will be driving an innovation engine across every single domain of healthcare delivery services. Recent developments include: February 2023: The Centre for Development of Advanced Computing (C-DAC) launched two software tools critical for research in life sciences. Integrated Computing Environment, one of the products, is an indigenous cloud-based genomics computational facility for bioinformatics that integrates ICE-cube, a hardware infrastructure, and ICE flakes. This software will help securely store and analyze petascale to exascale genomics data., January 2023: Insilico Medicine, a clinical-stage, end-to-end artificial intelligence (AI)-driven drug discovery company, launched the 6th generation Intelligent Robotics Lab to accelerate its AI-driven drug discovery. The fully automated AI-powered robotics laboratory performs target discovery, compound screening, precision medicine development, and translational research.. Key drivers for this market are: Increase in Bioinformatics Research, Increasing Number of Clinical Studies in Pharmacogenomics and Pharmacokinetics; Growth of Drug Designing and Disease Modeling. Potential restraints include: Lack of Trained Professionals. Notable trends are: Industry and Commercials Sub-segment is Expected to hold its Highest Market Share in the End User Segment.
B
Biological Simulation Analysis Software Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Biological Simulation Analysis Software Report [Dataset]. https://www.marketreportanalytics.com/reports/biological-simulation-analysis-software-56642
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 3, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global market for Biological Simulation Analysis Software is experiencing robust growth, driven by the increasing adoption of computational biology in drug discovery, personalized medicine, and ecological research. The market's expansion is fueled by several key factors: the rising need for accurate and efficient biological data analysis, the decreasing cost of high-performance computing, and the growing availability of large biological datasets generated through advanced sequencing technologies. The software market is segmented by application (biological, ecological, medical) and operating system (Windows, Mac, Linux). While all segments show promising growth, the medical application segment is anticipated to lead, driven by the escalating demand for personalized medicine and the development of novel therapeutics. The increasing complexity of biological systems and the need for sophisticated simulation models further contribute to the market's growth trajectory. Major players in the market, including Dassault Systèmes, SoftGenetics, CGS, SnapGene, Geneious, Gene Codes Corporation, and PREMIER Biosoft, are continuously innovating and expanding their product offerings to meet the evolving needs of researchers and scientists. Competitive landscape involves a mix of established players and emerging startups focused on niche applications within the field. Geographical growth is diverse; North America and Europe currently hold a significant market share due to established research infrastructure, but Asia-Pacific is witnessing rapid growth, fueled by increased research funding and a growing pool of skilled scientists. Over the forecast period (2025-2033), the market is projected to maintain a steady growth rate, propelled by advancements in artificial intelligence and machine learning, which are enhancing the predictive capabilities of simulation software. Integration of cloud computing is further streamlining data management and analysis processes, making the software more accessible and cost-effective. However, challenges such as the high cost of software licenses and the need for specialized expertise to operate the software may act as potential restraints. Nevertheless, ongoing technological advancements and the increasing demand for efficient biological data analysis are expected to overcome these challenges, leading to a sustained expansion of the Biological Simulation Analysis Software market. Future growth will largely depend on continued innovation, accessibility, and user-friendliness of the software.
d
Predicting reservoir hosts based on early SARS-CoV-2 samples and analyzing...
search.dataone.org
datadryad.org
+1more
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qian Guo; Mo Li; Chunhui Wang; Jinyuan Guo; Xiaoqing Jiang; Jie Tan; Peihong Wang; Shufang Wu; Tingting Xiao; Man Zhou; Zhencheng Fang; Yonghong Xiao; Huaiqiu Zhu (2025). Predicting reservoir hosts based on early SARS-CoV-2 samples and analyzing later world-wide pandemic [Dataset]. http://doi.org/10.5061/dryad.zgmsbcc8v
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.zgmsbcc8v
Dataset updated
Apr 21, 2025
Dataset provided by
Dryad Digital Repository
Authors
Qian Guo; Mo Li; Chunhui Wang; Jinyuan Guo; Xiaoqing Jiang; Jie Tan; Peihong Wang; Shufang Wu; Tingting Xiao; Man Zhou; Zhencheng Fang; Yonghong Xiao; Huaiqiu Zhu
Time period covered
Jan 1, 2020
Description
The SARS-CoV-2 pandemic has raised the concern for reservoir hosts of the virus since the early-stage outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting the viral genomic features, to calculate the infection likelihoods and further predict the probable hosts of novel viruses. Overcoming the limitation of sequence similarity-based methods, DeepHoF was applied to the analysis of SARS-CoV-2 in the 2020 pandemic. Using the isolates sequenced in the earliest stage of COVID-19, DeepHoF identified minks, bats, dogs and cats can be highly susceptible to SARS-CoV-2, while minks might be one of the most noteworthy reservoir hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the infection likelihood on human or the host range. With a large-scale genome analysis based on DeepHoFâ€™s computation for the later world-wide pandemic, it should not be slighted for the probably bidirectional transmission of SARS-CoV-2 between hu...
C
Computational Biology Industry Report
marketreportanalytics.com
doc, pdf, ppt
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Computational Biology Industry Report [Dataset]. https://www.marketreportanalytics.com/reports/computational-biology-industry-96013
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 4, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The computational biology market is experiencing robust growth, driven by the increasing adoption of advanced technologies like artificial intelligence (AI) and machine learning (ML) in drug discovery and development. The market's Compound Annual Growth Rate (CAGR) of 13.33% from 2019 to 2024 indicates a significant upward trajectory, projected to continue into the forecast period (2025-2033). Key drivers include the rising prevalence of chronic diseases necessitating faster and more efficient drug development processes, the decreasing cost of high-throughput sequencing and data storage, and the increasing availability of large biological datasets fueling advanced computational analyses. The market segmentation reveals strong demand across various applications, including cellular and biological simulations (particularly in genomics and proteomics), drug discovery and disease modeling (with target identification and validation being prominent areas), and preclinical drug development (focused on pharmacokinetics and pharmacodynamics). Clinical trial applications are also significant, spanning Phases I, II, and III. Software tools like databases, analysis software, and specialized infrastructure are critical components, further segmented by service type (in-house vs. contract) and end-user (academic institutions and commercial entities). North America currently holds a significant market share, but Asia-Pacific is projected to witness substantial growth owing to increasing investments in research and development and the rising adoption of computational biology techniques in emerging economies. The competitive landscape is dynamic, with several major players such as Dassault Systèmes SE, Certara, and Schrödinger contributing to innovation. However, the market also includes numerous smaller, specialized companies focusing on niche applications or specific technologies. This competitive landscape encourages continuous innovation, driving the development of more sophisticated software, improved algorithms, and enhanced analytical capabilities. While data limitations exist regarding precise market size figures, extrapolating from the provided CAGR and industry reports suggests a substantial market value currently, exceeding several billion dollars and poised for continued expansion. The focus on precision medicine and personalized therapies further strengthens the long-term growth potential of the computational biology market. Challenges include the complexity of biological systems, the need for robust data validation, and the ethical considerations associated with the use of AI and big data in healthcare. Recent developments include: February 2023: The Centre for Development of Advanced Computing (C-DAC) launched two software tools critical for research in life sciences. Integrated Computing Environment, one of the products, is an indigenous cloud-based genomics computational facility for bioinformatics that integrates ICE-cube, a hardware infrastructure, and ICE flakes. This software will help securely store and analyze petascale to exascale genomics data., January 2023: Insilico Medicine, a clinical-stage, end-to-end artificial intelligence (AI)-driven drug discovery company, launched the 6th generation Intelligent Robotics Lab to accelerate its AI-driven drug discovery. The fully automated AI-powered robotics laboratory performs target discovery, compound screening, precision medicine development, and translational research.. Key drivers for this market are: Increase in Bioinformatics Research, Increasing Number of Clinical Studies in Pharmacogenomics and Pharmacokinetics; Growth of Drug Designing and Disease Modeling. Potential restraints include: Increase in Bioinformatics Research, Increasing Number of Clinical Studies in Pharmacogenomics and Pharmacokinetics; Growth of Drug Designing and Disease Modeling. Notable trends are: Industry and Commercials Sub-segment is Expected to hold its Highest Market Share in the End User Segment.
Z
Large-scale Docking Datasets for Machine Learning
data.niaid.nih.gov
Updated May 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jens Carlsson (2023). Large-scale Docking Datasets for Machine Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7903160
Explore at:
Dataset updated
May 22, 2023
Dataset provided by
Andreas Luttens
Ulf Norinder
Israel Cabeza de Vaca
Leonard Sparring
Jens Carlsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large-scale virtual screening has become a valuable tool for early-phase drug discovery. Recent expansions of commercial chemical space have made it computationally intractable to evaluate all compounds in the libraries. Machine learning is one of the methods that aim to prioritize specific subsets of these vast libraries. In order to put these methods to the test, access to large-scale datasets is beneficial. To help the community benchmark their work, we share the docking scores of several ultralarge virtual screening campaigns.

The datasets we provide contain canonical SMILES, compound identifiers, and docking scores. We docked two different chemical libraries against eight different biological targets with therapeutic relevance. The first dataset contained approximately 15.5 million molecules adhering to the "Rule-of-Four", whereas the second datasets consists of approximately 235 million "lead-like" molecules. The biological targets represent different classes of proteins and binding sites.

More details on the datasets and our methods can be found on (https://github.com/carlssonlab/conformalpredictor) and our pre-print (https://doi.org/10.26434/chemrxiv-2023-w3x36).

Please feel free to download and use these datasets for your own research purposes. We only ask that you cite our pre-print and datasets appropriately if you use it in your work. Thank you for your interest in our research!
M
Molecular Biology Simulation Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Molecular Biology Simulation Software Report [Dataset]. https://www.datainsightsmarket.com/reports/molecular-biology-simulation-software-1963489
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jul 26, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Molecular Biology Simulation Software market is experiencing robust growth, driven by the increasing adoption of computational biology techniques in drug discovery, personalized medicine, and academic research. The market's expansion is fueled by several key factors: the decreasing cost and increasing power of computing resources, the growing availability of large biological datasets, and the rising demand for efficient and accurate methods for analyzing complex biological systems. Furthermore, advancements in algorithms and software functionalities are enhancing the predictive capabilities and user-friendliness of these simulation tools, broadening their appeal across various research and development settings. The market is segmented by software type (e.g., protein-protein interaction simulation, gene expression analysis, pathway modeling), application (e.g., drug design, genomics research, diagnostics), and end-user (e.g., pharmaceutical companies, academic institutions, biotechnology firms). While the precise market size for 2025 is not provided, considering a reasonable CAGR of 15% and a hypothetical 2024 market size of $500 million, we can estimate a 2025 market size of approximately $575 million. This estimation reflects the observed rapid expansion in the field. Competitive pressures exist among numerous software providers, leading to continuous innovation and improvement. Companies such as Hamilton Thorne, Hermes Medical Solutions, and others mentioned are key players contributing to this growth through the development and deployment of sophisticated simulation solutions. The forecast for 2025-2033 predicts continued expansion of the Molecular Biology Simulation Software market. This sustained growth is underpinned by ongoing technological advancements within the field of bioinformatics and the expanding need for sophisticated simulation tools to handle increasingly complex biological data. The emergence of artificial intelligence (AI) and machine learning (ML) algorithms integrated into these software solutions further accelerates the market's trajectory. While potential restraints like the high cost of software licenses and the need for specialized expertise can impact market penetration, the overall growth trajectory remains positive, fueled by the critical role molecular biology simulation plays in accelerating scientific discovery and technological advancements across diverse sectors. Further market segmentation by region would highlight regional variations in adoption rates and growth potentials, providing a more comprehensive understanding of the market dynamics.
Counted Nodules dataset used in 'RootPainter: Deep Learning Segmentation of...
zenodo.org
csv, zip
Updated Apr 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abraham George Smith; Eusun Han; Jens Petersen; Niels Alvin Faircloth Olsen; Christian Giese; Miriam Athmann; Dorte Bodin Dresbøll; Kristian Thorup-Kristensen; Abraham George Smith; Eusun Han; Jens Petersen; Niels Alvin Faircloth Olsen; Christian Giese; Miriam Athmann; Dorte Bodin Dresbøll; Kristian Thorup-Kristensen (2020). Counted Nodules dataset used in 'RootPainter: Deep Learning Segmentation of Biological Images with Corrective Annotation' [Dataset]. http://doi.org/10.5281/zenodo.3753603
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3753603
Dataset updated
Apr 16, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Abraham George Smith; Eusun Han; Jens Petersen; Niels Alvin Faircloth Olsen; Christian Giese; Miriam Athmann; Dorte Bodin Dresbøll; Kristian Thorup-Kristensen; Abraham George Smith; Eusun Han; Jens Petersen; Niels Alvin Faircloth Olsen; Christian Giese; Miriam Athmann; Dorte Bodin Dresbøll; Kristian Thorup-Kristensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Counted Nodules dataset used in the article: 'RootPainter: Deep Learning Segmentation of Biological Images with Corrective Annotation'
B
Biological Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Biological Software Report [Dataset]. https://www.datainsightsmarket.com/reports/biological-software-1444091
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Apr 21, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global biological software market is experiencing robust growth, driven by the increasing adoption of advanced technologies in life sciences research and healthcare. The market, estimated at $2.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of approximately 12% from 2025 to 2033, reaching an estimated market value of $7 billion by 2033. This expansion is fueled by several key factors: the escalating demand for high-throughput data analysis in genomics and proteomics, the rising prevalence of chronic diseases necessitating advanced diagnostic tools, and the growing adoption of cloud-based solutions for enhanced collaboration and accessibility. Furthermore, the continuous development of sophisticated algorithms and user-friendly interfaces is making biological software more accessible to a wider range of researchers and clinicians. The segment encompassing experimental design and data analysis software holds a significant market share, reflecting the crucial role of computational tools in optimizing research workflows and extracting meaningful insights from complex biological datasets. North America currently dominates the market, owing to the robust presence of established biotechnology companies and a well-funded research infrastructure. However, Asia-Pacific is expected to witness significant growth in the coming years due to the expanding healthcare sector and increasing government investments in research and development. Market restraints include the high cost of software licenses, the requirement for specialized training to effectively utilize these tools, and the potential challenges associated with data security and integration across different platforms. Nevertheless, the ongoing innovation in software capabilities, coupled with the increasing adoption of subscription-based models and cloud-based solutions, is expected to mitigate these constraints. The competitive landscape is characterized by a mix of established players like Thermo Fisher Scientific and DNASTAR, along with smaller specialized companies offering niche solutions. This dynamic competitive environment fosters innovation and drives the development of advanced biological software solutions tailored to the specific needs of diverse research and clinical applications. Future growth will be influenced by factors such as advancements in artificial intelligence and machine learning within the software, integration with laboratory automation systems, and increasing collaboration between software providers and research institutions.
Publicly available high-throughput drug combination screening datasets and...
plos.figshare.com
figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Delora Baptista; Pedro G. Ferreira; Miguel Rocha (2023). Publicly available high-throughput drug combination screening datasets and large-scale cancer cell line genomics and transcriptomics datasets that can be used to develop drug synergy prediction models. [Dataset]. http://doi.org/10.1371/journal.pcbi.1010200.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1010200.t001
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Delora Baptista; Pedro G. Ferreira; Miguel Rocha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets that were used in this work are highlighted in bold.
Electron microscopy images and morphometric data of SARS-CoV-2 variants in...
zenodo.org
data.niaid.nih.gov
Updated Aug 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tobias Hoffmann; Tobias Hoffmann; Michael Laue; Michael Laue (2024). Electron microscopy images and morphometric data of SARS-CoV-2 variants in ultrathin plastic sections - Dataset 05 (SARS-CoV-2 Delta B.1.617.2) [Dataset]. http://doi.org/10.5281/zenodo.13136809
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.13136809
Dataset updated
Aug 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tobias Hoffmann; Tobias Hoffmann; Michael Laue; Michael Laue
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset 05 comprises 153 transmission electron microscopy images of extracellular SARS-CoV-2 (isolate Delta B.1.617.2) particles in ultrathin plastic sections (45 nm) through Vero cell cultures. The images were recorded with dimensions of 4112 x 3008 pixels at a pixel size of 0.1641 nm and stored in 16-bit TIF format. It is recommended that an image viewer capable of reading 16-bit images, such as IrfanView, be used to visualize the images. The image files have been size calibrated and can be opened with the correct size calibration using ImageJ or Fiji with the Bioformats importer. A PDF document is provided with the image files, which describes the methods used for the generation of the images. Additionally, an XLSX file is included, offering morphometric particle measurements and the calculated statistical values for their distribution. The dataset was produced as dataset 05 for a comparative morphometric analysis of evolving SARS-CoV-2 variants. Further datasets used for the analysis are available in this repository (see dataset description document).

Facebook

Twitter

Click to copy link

Link copied

Cite

Bastian Eichenberger; YinXiu Zhan (2023). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.12958037.v1

Datasets

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.12958037.v1

Dataset updated

May 31, 2023

Dataset provided by

figshare

Authors

Bastian Eichenberger; YinXiu Zhan

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

The benchmarking datasets used for deepBlink. The npz files contain train/valid/test splits inside and can be used directly. The files belong to the following challenges / classes:- ISBI Particle tracking challenge: microtubule, vesicle, receptor- Custom synthetic (based on http://smal.ws): particle- Custom fixed cell: smfish- Custom live cell: suntagThe csv files are to determine which image in the test splits correspond to which original image, SNR, and density.

Clear search

Close search

Google apps

Main menu

Datasets

Data from: Development of the InTelligence And Machine LEarning (TAME)...

BioSR: a biological image dataset for super-resolution microscopy

Data from: Benchmarking imputation methods for categorical biological data

Digital Biology Report

Table1_Interpretable machine learning methods for predictions in systems...

Supporting data for the thesis entitled "Interpretable deep learning methods...

On the nature of mixed-type features in materials datasets

Data from: A machine learning framework for extracting information from...

Biological Gender Detection Dataset

Biological Gender Detection

Computational Biology Industry Report

Biological Simulation Analysis Software Report

Predicting reservoir hosts based on early SARS-CoV-2 samples and analyzing...

Computational Biology Industry Report

Large-scale Docking Datasets for Machine Learning

Molecular Biology Simulation Software Report

Counted Nodules dataset used in 'RootPainter: Deep Learning Segmentation of...

Biological Software Report

Publicly available high-throughput drug combination screening datasets and...

Electron microscopy images and morphometric data of SARS-CoV-2 variants in...

Datasets