100+ datasets found
  1. Bioinformatic databases survey

    • zenodo.org
    csv
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bioinformatic databases survey

    The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

    Data content

    The dataset is composed of two tables :

    A. Databases table : Contains the information of each database published in the NAR database issue.

    • db_id : Database ID in the dataset
    • resource_name : Name(s) of the database
    • current_access : Latest known web address of the database
    • is_a_pun : The database name is a play on word
    • available_2022 : The database was accessible online during the 2022 survey
    • last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)
    • unavailable_message : If not accessible, the message/error when trying to access the ressource
    • year_first_publication : Year of first publication of the database
    • year_last_publication : Year of latest publication of the database (including database update publications)
    • total_citations_2022 : Cumulative number of citation for all articles of the database
    • nb_authors_max : Maximum number of authors associated to any articles published for that database
    • nb_articles_2022 : Number of articles published for that database in 2022

    B. Articles table : Contains the information collected for the NAR articles

    • collector : Person who contributed to add this database in the dataset
    • article_global_id : DOI of the article surveyed
    • db_id : Database ID of the ressource described in the article
    • article_id : Article unique ID
    • article_year : Article publication year
    • Authors : list of authors of the article. Separated by ";"
    • Author.ID : list of ORCID of the authors of the article. Separated by ";"
    • Title : Title of the atricle
    • Source.title : Journal name
    • Volume : Volume number
    • Issue : Issue number
    • Funding.Details : Funding information of the article
    • Funding.Text : Funding text provided by the authors
    • PubMed.ID : Pubmed ID of the article
    • citations_2016 : Number of citations of the article in 2016 (if published)
    • citations_2022 : Number of citations of the article in 2022
    • nb_authors : Number of authors in the article
    • Index.Keywords : Keywords associated to the publication

    Data sources

    Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

    The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

    The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.

  2. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  3. n

    Bioinformatics Links Directory

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Bioinformatics Links Directory [Dataset]. http://identifiers.org/RRID:SCR_008018
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Database of curated links to molecular resources, tools and databases selected on the basis of recommendations from bioinformatics experts in the field. This resource relies on input from its community of bioinformatics users for suggestions. Starting in 2003, it has also started listing all links contained in the NAR Webserver issue. The different types of information available in this portal: * Computer Related: This category contains links to resources relating to programming languages often used in bioinformatics. Other tools of the trade, such as web development and database resources, are also included here. * Sequence Comparison: Tools and resources for the comparison of sequences including sequence similarity searching, alignment tools, and general comparative genomics resources. * DNA: This category contains links to useful resources for DNA sequence analyses such as tools for comparative sequence analysis and sequence assembly. Links to programs for sequence manipulation, primer design, and sequence retrieval and submission are also listed here. * Education: Links to information about the techniques, materials, people, places, and events of the greater bioinformatics community. Included are current news headlines, literature sources, educational material and links to bioinformatics courses and workshops. * Expression: Links to tools for predicting the expression, alternative splicing, and regulation of a gene sequence are found here. This section also contains links to databases, methods, and analysis tools for protein expression, SAGE, EST, and microarray data. * Human Genome: This section contains links to draft annotations of the human genome in addition to resources for sequence polymorphisms and genomics. Also included are links related to ethical discussions surrounding the study of the human genome. * Literature: Links to resources related to published literature, including tools to search for articles and through literature abstracts. Additional text mining resources, open access resources, and literature goldmines are also listed. * Model Organisms: Included in this category are links to resources for various model organisms ranging from mammals to microbes. These include databases and tools for genome scale analyses. * Other Molecules: Bioinformatics tools related to molecules other than DNA, RNA, and protein. This category will include resources for the bioinformatics of small molecules as well as for other biopolymers including carbohydrates and metabolites. * Protein: This category contains links to useful resources for protein sequence and structure analyses. Resources for phylogenetic analyses, prediction of protein features, and analyses of interactions are also found here. * RNA: Resources include links to sequence retrieval programs, structure prediction and visualization tools, motif search programs, and information on various functional RNAs.

  4. r

    University of Pittsburgh Bioinformatics Resources Collection

    • rrid.site
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). University of Pittsburgh Bioinformatics Resources Collection [Dataset]. http://identifiers.org/RRID:SCR_005845
    Explore at:
    Dataset updated
    Aug 9, 2025
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. To bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources we have created the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System at the University of Pittsburgh. The OBRC containing 1542 major online bioinformatics databases and software tools was constructed using the HSLS content management system built on the Zope? Web application server. To enhance the output of search results we further implemented the Vivsimo Clustering Engine? which automatically organizes the search results into categories created dynamically based on the textual information of the retrieved records. As the largest online collection of its kind and the only one with advanced search results clustering OBRC is aimed at becoming a one-stop guided information gateway to the major bioinformatics databases and software tools on the Web. OBRC is available at the University of Pittsburgh's Health Sciences Library System.

  5. f

    Table_4_Comprehensive Review of Web Servers and Bioinformatics Tools for...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Feb 5, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zheng, Hong; Xie, Longxiang; Zhu, Wan; Dong, Huan; Guo, Xiangqian; Zhang, Lu; Li, Yongqiang; Yan, Zhongyi; Li, Huimin; Zhang, Guosen; Han, Yali; An, Yang; Wang, Qiang (2020). Table_4_Comprehensive Review of Web Servers and Bioinformatics Tools for Cancer Prognosis Analysis.XLSX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000449982
    Explore at:
    Dataset updated
    Feb 5, 2020
    Authors
    Zheng, Hong; Xie, Longxiang; Zhu, Wan; Dong, Huan; Guo, Xiangqian; Zhang, Lu; Li, Yongqiang; Yan, Zhongyi; Li, Huimin; Zhang, Guosen; Han, Yali; An, Yang; Wang, Qiang
    Description

    Prognostic biomarkers are of great significance to predict the outcome of patients with cancer, to guide the clinical treatments, to elucidate tumorigenesis mechanisms, and offer the opportunity of identifying therapeutic targets. To screen and develop prognostic biomarkers, high throughput profiling methods including gene microarray and next-generation sequencing have been widely applied and shown great success. However, due to the lack of independent validation, only very few prognostic biomarkers have been applied for clinical practice. In order to cross-validate the reliability of potential prognostic biomarkers, some groups have collected the omics datasets (i.e., epigenetics/transcriptome/proteome) with relative follow-up data (such as OS/DSS/PFS) of clinical samples from different cohorts, and developed the easy-to-use online bioinformatics tools and web servers to assist the biomarker screening and validation. These tools and web servers provide great convenience for the development of prognostic biomarkers, for the study of molecular mechanisms of tumorigenesis and progression, and even for the discovery of important therapeutic targets. Aim to help researchers to get a quick learning and understand the function of these tools, the current review delves into the introduction of the usage, characteristics and algorithms of tools, and web servers, such as LOGpc, KM plotter, GEPIA, TCPA, OncoLnc, PrognoScan, MethSurv, SurvExpress, UALCAN, etc., and further help researchers to select more suitable tools for their own research. In addition, all the tools introduced in this review can be reached at http://bioinfo.henu.edu.cn/WebServiceList.html.

  6. e

    HAMAP

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). HAMAP [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    HAMAP stands for High-quality Automated and Manual Annotation of Proteins. HAMAP profiles are manually created by expert curators. They identify proteins that are part of well-conserved protein families or subfamilies. HAMAP is based at the SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.

  7. B

    Bioinformatics Cloud Platform Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jan 6, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2026). Bioinformatics Cloud Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/bioinformatics-cloud-platform-58815
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jan 6, 2026
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2026 - 2034
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Bioinformatics Cloud Platform market is booming, projected to reach $10 billion by 2033 with a 20% CAGR. Discover key trends, drivers, restraints, and leading companies shaping this rapidly evolving sector in genomics, drug discovery, and academic research. Learn more about SaaS, PaaS, and IaaS solutions.

  8. f

    FAIRsharing record for: Poxvirus Bioinformatics Resource Center

    • fairsharing.org
    • search.datacite.org
    Updated Jan 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). FAIRsharing record for: Poxvirus Bioinformatics Resource Center [Dataset]. http://doi.org/10.25504/FAIRsharing.bn6jba
    Explore at:
    Dataset updated
    Jan 4, 2017
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This FAIRsharing record describes: Poxvirus Bioinformatics Resource Center has been established to provide specialized web-based resources to the scientific community studying poxviruses. This resource is no longer being maintained. For tools and data supporting virus genomics, especially related to poxviruses and other large DNA viruses, please visit the Viral Bioinformatics site maintained by our collaborator, Chris Upton: http://virology.ca For information on virus taxonomy, please visit the ICTV web site at http://www.ictvonline.org/ For updated sequence data and analytical tools, please visit http://www.viprbrc.org

  9. Extracted Schemas from the Life Sciences Linked Open Data Cloud

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maulik Kamdar (2023). Extracted Schemas from the Life Sciences Linked Open Data Cloud [Dataset]. http://doi.org/10.6084/m9.figshare.12402425.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Maulik Kamdar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is related to the manuscript "An empirical meta-analysis of the life sciences linked open data on the web" published at Nature Scientific Data. If you use the dataset, please cite the manuscript as follows:Kamdar, M.R., Musen, M.A. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 8, 24 (2021). https://doi.org/10.1038/s41597-021-00797-yWe have extracted schemas from more than 80 publicly available biomedical linked data graphs in the Life Sciences Linked Open Data (LSLOD) cloud into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. The dataset published here contains the following files:- The set of Linked Data Graphs from the LSLOD cloud from which schemas are extracted.- Refined Sets of extracted classes, object properties, data properties, and datatypes, shared across the Linked Data Graphs on LSLOD cloud. Where the schema element is reused from a Linked Open Vocabulary or an ontology, it is explicitly indicated.- The LSLOD Schema Graph, which contains all the above extracted schema elements interlinked with each other based on the underlying content. Sample instances and sample assertions are also provided along with broad level characteristics of the modeled content. The LSLOD Schema Graph is saved as a JSON Pickle File. To read the JSON object in this Pickle file use the Python command as follows:with open('LSLOD-Schema-Graph.json.pickle' , 'rb') as infile: x = pickle.load(infile, encoding='iso-8859-1')Check the Referenced Link for more details on this research, raw data files, and code references.

  10. Predefined workflows in the ZBIT Bioinformatics Toolbox.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Römer; Johannes Eichner; Andreas Dräger; Clemens Wrzodek; Finja Wrzodek; Andreas Zell (2023). Predefined workflows in the ZBIT Bioinformatics Toolbox. [Dataset]. http://doi.org/10.1371/journal.pone.0149263.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael Römer; Johannes Eichner; Andreas Dräger; Clemens Wrzodek; Finja Wrzodek; Andreas Zell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Predefined workflows in the ZBIT Bioinformatics Toolbox.

  11. M

    PATRIC: Bacterial Bioinformatics Resource Center

    • datacatalog.mskcc.org
    Updated Nov 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). PATRIC: Bacterial Bioinformatics Resource Center [Dataset]. https://datacatalog.mskcc.org/dataset/10392
    Explore at:
    Dataset updated
    Nov 13, 2019
    Description

    PATRIC (Pathosystems Resource Integration Center) is the Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools. PATRIC sharpens and hones the scope of available bacterial phylogenomic data from numerous sources specifically for the bacterial research community, in order to save biologists time and effort when conducting comparative analyses. The freely available PATRIC platform provides an interface for biologists to discover data and information and conduct comprehensive comparative genomics and other analyses in a one-stop shop.

  12. I

    Funding and Operating Organizations for Long-Lived Molecular Biology...

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Imker, Funding and Operating Organizations for Long-Lived Molecular Biology Databases [Dataset]. http://doi.org/10.13012/B2IDB-3993338_V1
    Explore at:
    Authors
    Heidi Imker
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.

  13. f

    Program and web site of bioinformatics used in this study.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sakae, Kotaro; Furuhashi, Miyuna; Nagano, Keiji; Hasegawa, Yoshiaki (2021). Program and web site of bioinformatics used in this study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000736824
    Explore at:
    Dataset updated
    Jul 26, 2021
    Authors
    Sakae, Kotaro; Furuhashi, Miyuna; Nagano, Keiji; Hasegawa, Yoshiaki
    Description

    Program and web site of bioinformatics used in this study.

  14. PBMC training data

    • figshare.com
    hdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Menden (2023). PBMC training data [Dataset]. http://doi.org/10.6084/m9.figshare.8052221.v1
    Explore at:
    hdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Kevin Menden
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the PBMC training dataset used for training Scaden models to perform deconvolution on PBMC RNA-seq datasets. It is compiled from four different PBMC scRNA-seq datasets downloaded from the 10X Genomics website (donorA, donorC, data6k, data8k).The datasets downloaded from 10X Genomics were processed and used to generate artificial bulk RNA-seq samples, which result in this dataset. A link to the 10X Genomics datasets site is provided.

  15. バイオサイエンスにおけるID

    • figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toshiaki Katayama (2023). バイオサイエンスにおけるID [Dataset]. http://doi.org/10.6084/m9.figshare.6597509.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Toshiaki Katayama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation slides on the identifiers in biosciences at the Japan Open Science Summit (JOSS) 2018.

  16. Ensembl TSS dataset for GRCh38

    • zenodo.org
    • investigacion.ubu.es
    bin
    Updated Aug 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José A. Barbero-Aparicio; José A. Barbero-Aparicio; Alicia Olivares-Gil; Alicia Olivares-Gil; José F. Díez-Pastor; José F. Díez-Pastor; César García-Osorio; César García-Osorio (2024). Ensembl TSS dataset for GRCh38 [Dataset]. http://doi.org/10.5281/zenodo.7147597
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José A. Barbero-Aparicio; José A. Barbero-Aparicio; Alicia Olivares-Gil; Alicia Olivares-Gil; José F. Díez-Pastor; José F. Díez-Pastor; César García-Osorio; César García-Osorio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used the human genome reference sequence in its GRCh38.p13 version in order to have a reliable source of data in which to carry out our experiments. We chose this version because it is the most recent one available in Ensemble at the moment. However, the DNA sequence by itself is not enough, the specific TSS position of each transcript is needed. In this section, we explain the steps followed to generate the final dataset. These steps are: raw data gathering, positive instances processing, negative instances generation and data splitting by chromosomes.

    First, we need an interface in order to download the raw data, which is composed by every transcript sequence in the human genome. We used Ensembl release 104 (Howe et al., 2020) and its utility BioMart (Smedley et al., 2009), which allows us to get large amounts of data easily. It also enables us to select a wide variety of interesting fields, including the transcription start and end sites. After filtering instances that present null values in any relevant field, this combination of the sequence and its flanks will form our raw dataset. Once the sequences are available, we find the TSS position (given by Ensembl) and the 2 following bases to treat it as a codon. After that, 700 bases before this codon and 300 bases after it are concatenated, getting the final sequence of 1003 nucleotides that is going to be used in our models. These specific window values have been used in (Bhandari et al., 2021) and we have kept them as we find it interesting for comparison purposes. One of the most sensitive parts of this dataset is the generation of negative instances. We cannot get this kind of data in a straightforward manner, so we need to generate it synthetically. In order to get examples of negative instances, i.e. sequences that do not represent a transcript start site, we select random DNA positions inside the transcripts that do not correspond to a TSS. Once we have selected the specific position, we get 700 bases ahead and 300 bases after it as we did with the positive instances.

    Regarding the positive to negative ratio, in a similar problem, but studying TIS instead of TSS (Zhang135
    et al., 2017), a ratio of 10 negative instances to each positive one was found optimal. Following this136
    idea, we select 10 random positions from the transcript sequence of each positive codon and label them137
    as negative instances. After this process, we end up with 1,122,113 instances: 102,488 positive and 1,019,625 negative sequences. In order to validate and test our models, we need to split this dataset into three parts: train, validation and test. We have decided to make this differentiation by chromosomes, as it is done in (Perez-Rodriguez et al., 2020). Thus, we use chromosome 16 as validation because it is a good example of a chromosome with average characteristics. Then we selected samples from chromosomes 1, 3, 13, 19 and 21 to be part of the test set and used the rest of them to train our models. Every step of this process can be replicated using the scripts available in https://github.com/JoseBarbero/EnsemblTSSPrediction.

  17. e

    NCBIFAM

    • ebi.ac.uk
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Aug 6, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).

  18. f

    Data from: Getting the best of Linked Data and Property Graphs: rdf2neo and...

    • swat4hcls.figshare.com
    png
    Updated Dec 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Brandizi; Ajit Singh; Christopher Rawlings; Keywan Hassani-Pak (2018). Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case [Dataset]. http://doi.org/10.6084/m9.figshare.7314323.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Dec 5, 2018
    Dataset provided by
    Semantic Web Applications and Tools for Healthcare and Life Sciences
    Authors
    Marco Brandizi; Ajit Singh; Christopher Rawlings; Keywan Hassani-Pak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paper submitted to SWAT4LS 2018. We introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomics-related real use cases, we show how such mapping can allow for a hybrid approach to the management of networked knowledge, based on taking advantage of the best of both RDF and property graphs.

  19. bioinformatics.com.cn Website Traffic, Ranking, Analytics [December 2025]

    • sr01.toolswala.net
    Updated Jan 13, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semrush (2026). bioinformatics.com.cn Website Traffic, Ranking, Analytics [December 2025] [Dataset]. https://sr01.toolswala.net/_www/website/bioinformatics.com.cn/overview/
    Explore at:
    Dataset updated
    Jan 13, 2026
    Dataset authored and provided by
    Semrushhttps://fr.semrush.com/
    License

    https://sr01.toolswala.net/_www/company/legal/terms-of-service/https://sr01.toolswala.net/_www/company/legal/terms-of-service/

    Time period covered
    Jan 13, 2026
    Area covered
    Worldwide
    Variables measured
    visits, backlinks, bounceRate, pagesPerVisit, authorityScore, organicKeywords, avgVisitDuration, referringDomains, trafficByCountry, paidSearchTraffic, and 3 more
    Measurement technique
    Semrush Traffic Analytics; Click-stream data
    Description

    bioinformatics.com.cn is ranked #4532 in CN with 113.62K Traffic. Categories: . Learn more about website traffic, market share, and more!

  20. f

    Additional file 1 of INSaFLU-TELEVIR: an open web-based bioinformatics suite...

    • datasetcatalog.nlm.nih.gov
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Horton, Daniel L.; Santos, João Dourado; Pinheiro, Miguel; Santos, André; Pinto, Miguel; Bogaardt, Carlijn; Mamede, Rafael; Gomes, João Paulo; Borges, Vítor; Isidro, Joana; Sobral, Daniel; Eusébio, Rodrigo (2024). Additional file 1 of INSaFLU-TELEVIR: an open web-based bioinformatics suite for viral metagenomic detection and routine genomic surveillance [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001405514
    Explore at:
    Dataset updated
    Aug 15, 2024
    Authors
    Horton, Daniel L.; Santos, João Dourado; Pinheiro, Miguel; Santos, André; Pinto, Miguel; Bogaardt, Carlijn; Mamede, Rafael; Gomes, João Paulo; Borges, Vítor; Isidro, Joana; Sobral, Daniel; Eusébio, Rodrigo
    Description

    Additional file 1. Benchmark of the INSaFLU-TELEVIR pipeline for virus detection (TELEVIR): Resources, Workflow details, Benchmark and Implementation. Additional file 2. Benchmarking of INSaFLU against commonly used command line bioinformatics workflows for SARS-CoV-2 reference-based consensus generation (amplicon-based Illumina and ONT data), and validation of the INSaFLU snakemake pipeline. Additional file 3: Supplementary figures 1-8. Additional file 4: Supplementary tables 1-8.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott (2024). Bioinformatic databases survey [Dataset]. http://doi.org/10.5281/zenodo.12790448
Organization logo

Bioinformatic databases survey

Explore at:
csvAvailable download formats
Dataset updated
Aug 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alise Ponsero; Alise Ponsero; Bonnie Hurwitz; Bonnie Hurwitz; Kiran Smelser; Kiran Smelser; Karen Valencia; Lucas Jimenez Miranda; Lucas Jimenez Miranda; Abby McDermott; Karen Valencia; Abby McDermott
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Bioinformatic databases survey

The dataset surveys bioinformatic databases published in the NAR database issue from 1995 to 2022. It evaluates the current number of citations and availability of each ressources.

Data content

The dataset is composed of two tables :

A. Databases table : Contains the information of each database published in the NAR database issue.

  • db_id : Database ID in the dataset
  • resource_name : Name(s) of the database
  • current_access : Latest known web address of the database
  • is_a_pun : The database name is a play on word
  • available_2022 : The database was accessible online during the 2022 survey
  • last_accessible_year : If not accessible, latest point in time where the database was found online (using the Internet web archive snapshots)
  • unavailable_message : If not accessible, the message/error when trying to access the ressource
  • year_first_publication : Year of first publication of the database
  • year_last_publication : Year of latest publication of the database (including database update publications)
  • total_citations_2022 : Cumulative number of citation for all articles of the database
  • nb_authors_max : Maximum number of authors associated to any articles published for that database
  • nb_articles_2022 : Number of articles published for that database in 2022

B. Articles table : Contains the information collected for the NAR articles

  • collector : Person who contributed to add this database in the dataset
  • article_global_id : DOI of the article surveyed
  • db_id : Database ID of the ressource described in the article
  • article_id : Article unique ID
  • article_year : Article publication year
  • Authors : list of authors of the article. Separated by ";"
  • Author.ID : list of ORCID of the authors of the article. Separated by ";"
  • Title : Title of the atricle
  • Source.title : Journal name
  • Volume : Volume number
  • Issue : Issue number
  • Funding.Details : Funding information of the article
  • Funding.Text : Funding text provided by the authors
  • PubMed.ID : Pubmed ID of the article
  • citations_2016 : Number of citations of the article in 2016 (if published)
  • citations_2022 : Number of citations of the article in 2022
  • nb_authors : Number of authors in the article
  • Index.Keywords : Keywords associated to the publication

Data sources

Note that the presented dataset leverage and expand on the dataset gathered and published in Imker, H.J., 2020. Who Bears the Burden of Long-Lived Molecular Biology Databases?. Data Science Journal, 19(1), p.8. The original dataset collected by Dr. Imker is available at : https://doi.org/10.13012/B2IDB-4311325_V1

The dataset was collected and is maintained by undergraduate students of a CURE class (Course-based Undergraduate Research Experience) held at the University of Arizona. All students of the class have participated to the collection, update and curation the dataset that is available as a database and a web-portal at https://hurwitzlab.shinyapps.io/DS_Heroes/. Students could elect to be added or not as author to this Zenodo repository.

The CURE class BAT102 "Data Science Heroes: An undergraduate research experience in Open Data Science Practices" gives the students an opportunity to learn about open science and investigate open data practices in bioinformatics through a survey of the databases published in the NAR database issue.

Search
Clear search
Close search
Google apps
Main menu