100+ datasets found

Fantastic databases and where to find them: Web applications for researchers...
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20018091.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Gerda Cristal Villalba; Ursula Matte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .
e
PROSITE profiles
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
d
Alternative Splicing Annotation Project II Database
dknet.org
scicrunch.org
+3more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Alternative Splicing Annotation Project II Database [Dataset]. http://identifiers.org/RRID:SCR_000322
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000322
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented on 8/12/13. An expanded version of the Alternative Splicing Annotation Project (ASAP) database with a new interface and integration of comparative features using UCSC BLASTZ multiple alignments. It supports 9 vertebrate species, 4 insects, and nematodes, and provides with extensive alternative splicing analysis and their splicing variants. As for human alternative splicing data, newly added EST libraries were classified and included into previous tissue and cancer classification, and lists of tissue and cancer (normal) specific alternatively spliced genes are re-calculated and updated. They have created a novel orthologous exon and intron databases and their splice variants based on multiple alignment among several species. These orthologous exon and intron database can give more comprehensive homologous gene information than protein similarity based method. Furthermore, splice junction and exon identity among species can be valuable resources to elucidate species-specific genes. ASAP II database can be easily integrated with pygr (unpublished, the Python Graph Database Framework for Bioinformatics) and its powerful features such as graph query, multi-genome alignment query and etc. ASAP II can be searched by several different criteria such as gene symbol, gene name and ID (UniGene, GenBank etc.). The web interface provides 7 different kinds of views: (I) user query, UniGene annotation, orthologous genes and genome browsers; (II) genome alignment; (III) exons and orthologous exons; (IV) introns and orthologous introns; (V) alternative splicing; (IV) isoform and protein sequences; (VII) tissue and cancer vs. normal specificity. ASAP II shows genome alignments of isoforms, exons, and introns in UCSC-like genome browser. All alternative splicing relationships with supporting evidence information, types of alternative splicing patterns, and inclusion rate for skipped exons are listed in separate tables. Users can also search human data for tissue- and cancer-specific splice forms at the bottom of the gene summary page. The p-values for tissue-specificity as log-odds (LOD) scores, and highlight the results for LOD >= 3 and at least 3 EST sequences are all also reported.
d
Bio Resource for Array Genes Database
dknet.org
scicrunch.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Bio Resource for Array Genes Database [Dataset]. http://identifiers.org/RRID:SCR_000748
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_000748
Dataset updated
Jan 29, 2022
Description
Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c. elegans genes. The resource includes information about the genes that are represented in Unigene clusters. This resource provides interactive tools to selectively view, analyze and interpret gene expression patterns against the background of gene and protein functional information. Different query options are provided to mine the biological relationships represented in the underlying database. Search button will take you to the list of query tools available. This Bio resource is a platform designed as an online resource to assist researchers in analyzing results of microarray experiments and developing a biological interpretation of the results. This site is mainly to interpret the unique gene expression patterns found as biological changes that can lead to new diagnostic procedures and drug targets. This interactive site allows users to selectively view a variety of information about gene functions that is stored in an underlying database. Although there are other online resources that provide a comprehensive annotation and summary of genes, this resource differs from these by further enabling researchers to mine biological relationships amongst the genes captured in the database using new query tools. Thus providing a unique way of interpreting the microarray data results based on the knowledge provided for the cellular roles of genes and proteins. A total of six different query tools are provided and each offer different search features, analysis options and different forms of display and visualization of data. The data is collected in relational database from public resources: Unigene, Locus link, OMIM, NCBI dbEST, protein domains from NCBI CDD, Gene Ontology, Pathways (Kegg, Genmapp and Biocarta) and BIND (Protein interactions). Data is dynamically collected and compiled twice a week from public databases. Search options offer capability to organize and cluster genes based on their Interactions in biological pathways, their association with Gene Ontology terms, Tissue/organ specific expression or any other user-chosen functional grouping of genes. A color coding scheme is used to highlight differential gene expression patterns against a background of gene functional information. Concept hierarchies (Anatomy and Diseases) of MESH (Medical Subject Heading) terms are used to organize and display the data related to Tissue specific expression and Diseases. Sponsors: BioRag database is maintained by the Bioinformatics group at Arizona Cancer Center. The material presented here is compiled from different public databases. BioRag is hosted by the Biotechnology Computing Facility of the University of Arizona. 2002,2003 University of Arizona.
e
SFLD
ebi.ac.uk
Updated Sep 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Sep 7, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
r
DAVID
rrid.site
neuinfo.org
+1more
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). DAVID [Dataset]. http://identifiers.org/RRID:SCR_001881
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_001881
Dataset updated
Jul 27, 2025
Description
Bioinformatics resource system including web server and web service for functional annotation and enrichment analyses of gene lists. Consists of comprehensive knowledgebase and set of functional analysis tools. Includes gene centered database integrating heterogeneous gene annotation resources to facilitate high throughput gene functional analysis.
e
CATH-Gene3D
ebi.ac.uk
Updated Oct 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Oct 21, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.
I
Funding and Operating Organizations for Long-Lived Molecular Biology...
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker, Funding and Operating Organizations for Long-Lived Molecular Biology Databases [Dataset]. http://doi.org/10.13012/B2IDB-3993338_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-3993338_V1
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The organizations that contribute to the longevity of 67 long-lived molecular biology databases published in Nucleic Acids Research (NAR) between 1991-2016 were identified to address two research questions 1) which organizations fund these databases? and 2) which organizations maintain these databases? Funders were determined by examining funding acknowledgements in each database's most recent NAR Database Issue update article published (prior to 2017) and organizations operating the databases were determine through review of database websites.
f
Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National...
frontiersin.figshare.com
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clive T. Darwell; Samart Wanchana; Vinitchan Ruanjaichon; Meechai Siangliw; Burin Thunnom; Wanchana Aesomnuk; Theerayut Toojinda (2023). Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National Genomic Resource Against a Global Database.zip [Dataset]. http://doi.org/10.3389/fpls.2022.781153.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpls.2022.781153.s001
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Clive T. Darwell; Samart Wanchana; Vinitchan Ruanjaichon; Meechai Siangliw; Burin Thunnom; Wanchana Aesomnuk; Theerayut Toojinda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Agricultural crop breeding programs, particularly at the national level, typically consist of a core panel of elite breeding cultivars alongside a number of local landrace varieties (or other endemic cultivars) that provide additional sources of phenotypic and genomic variation or contribute as experimental materials (e.g., in GWAS studies). Three issues commonly arise. First, focusing primarily on core development accessions may mean that the potential contributions of landraces or other secondary accessions may be overlooked. Second, elite cultivars may accumulate deleterious alleles away from nontarget loci due to the strong effects of artificial selection. Finally, a tendency to focus solely on SNP-based methods may cause incomplete or erroneous identification of functional variants. In practice, integration of local breeding programs with findings from global database projects may be challenging. First, local GWAS experiments may only indicate useful functional variants according to the diversity of the experimental panel, while other potentially useful loci—identifiable at a global level—may remain undiscovered. Second, large-scale experiments such as GWAS may prove prohibitively costly or logistically challenging for some agencies. Here, we present a fully automated bioinformatics pipeline (riceExplorer) that can easily integrate local breeding program sequence data with international database resources, without relying on any phenotypic experimental procedure. It identifies associated functional haplotypes that may prove more robust in determining the genotypic determinants of desirable crop phenotypes. In brief, riceExplorer evaluates a global crop database (IRRI 3000 Rice Genomes) to identify haplotypes that are associated with extreme phenotypic variation at the global level and recorded in the database. It then examines which potentially useful variants are present in the local crop panel, before distinguishing between those that are already incorporated into the elite breeding accessions and those only found among secondary varieties (e.g., landraces). Results highlight the effectiveness of our pipeline, identifying potentially useful functional haplotypes across the genome that are absent from elite cultivars and found among landraces and other secondary varieties in our breeding program. riceExplorer can automatically conduct a full genome analysis and produces annotated graphical output of chromosomal maps, potential global diversity sources, and summary tables.
f
uniprot-database_(type_ko).27.09.2019.tab.rar
figshare.com
application/x-rar
Updated Jun 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kumazawa Morais (2020). uniprot-database_(type_ko).27.09.2019.tab.rar [Dataset]. http://doi.org/10.6084/m9.figshare.12555422.v1
Explore at:
application/x-rarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12555422.v1
Dataset updated
Jun 24, 2020
Dataset provided by
figshare
Authors
Daniel Kumazawa Morais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current database was downloaded on 27.09.2019 and has the data fields (columns) as described below:# 1 Entry# 2 Entry name# 3 Status# 4 Protein names# 5 Gene names# 6 Organism# 7 Length# 8 Cross-reference (KO)# 9 Taxonomic lineage (PHYLUM)# 10 Taxonomic lineage (SPECIES) # This field carries current and old* taxonomic classifications.# 11 Taxonomic lineage (GENUS)# 12 Taxonomic lineage (KINGDOM)# 13 Taxonomic lineage (SUPERKINGDOM)# 14 Cross-reference (OrthoDB)# 15 Cross-reference (eggNOG)*Details about the classification used in UNIPROT can be found at the link: https://www.uniprot.org/help/taxonomy
f
Benchmarks against protein structural databases.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert K. Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin Dewey; Ian Holmes; Lior Pachter (2023). Benchmarks against protein structural databases. [Dataset]. http://doi.org/10.1371/journal.pcbi.1000392.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1000392.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS Computational Biology
Authors
Robert K. Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin Dewey; Ian Holmes; Lior Pachter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparisons of the accuracies (Acc), sensitivities (Sn) and positive predictive values (PPV) of FSA and other alignment methods on the BAliBASE 3 [24] and SABmark 1.65 [25] databases. Probalign has the highest accuracy on the commonly-used BAliBASE 3 dataset and FSA in default mode has superior accuracy on the BAliBASE 3+fp and SABmark 1.65 datasets (note that only FSA and AMAP explicitly attempt to maximize the expected accuracy). FSA has higher positive predictive values than any other program on all datasets and can additionally achieve high sensitivity when run in maximum-sensitivity mode. The BAliBASE 3+fp dataset, which mirrors BAliBASE 3 but includes a single non-homologous sequence in each alignment, was designed to test the robustness of alignment programs to incomplete homology. Traditional alignment programs, designed to maximize sensitivity, suffer greatly-increased mis-alignment when even a single non-homologous sequence is introduced; in contrast, FSA is robust to the non-homologous sequence and has an unchanged positive predictive value. Remarkably, FSA was the only tested program with a mis-alignment rate of
f
ASURAT knowledge-based databases
figshare.com
application/gzip
Updated May 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Keita Iida (2022). ASURAT knowledge-based databases [Dataset]. http://doi.org/10.6084/m9.figshare.19102598.v5
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19102598.v5
Dataset updated
May 9, 2022
Dataset provided by
figshare
Authors
Keita Iida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Knowledge-based databases and the codes for collecting these databases are stored.
e
Data from: Expasy
expasy.org
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SIB Swiss Institute of Bioinformatics (2023). Expasy [Dataset]. http://doi.org/10.25504/FAIRsharing.ceeffa
Explore at:
Unique identifier
https://doi.org/10.25504/FAIRsharing.ceeffa
Dataset updated
Sep 15, 2023
Dataset provided by
SIB Swiss Institute of Bioinformatics
Description
Expasy is the bioinformatics resource portal of the SIB Swiss Institute of Bioinformatics.
e
SMART
ebi.ac.uk
Updated Feb 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). SMART [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 14, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. SMART is based at EMBL, Heidelberg, Germany.
i
IDPredictor: predict database links in biomedical database. Supplementary...
doi.ipk-gatersleben.de
Updated Jan 1, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber; Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber; Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber (2012). IDPredictor: predict database links in biomedical database. Supplementary material A.3 for the paper [Dataset]. https://doi.ipk-gatersleben.de/DOI/ce9f7e62-56e5-4554-bb11-d7ab29e6fa1d/dd34a994-daf0-4b7f-9809-d875c1e771d2/2
Explore at:
Dataset updated
Jan 1, 2012
Dataset provided by
e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, D-06466, Germany
Authors
Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber; Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber; Matthias Lange; Hendrik Mehlhorn; Uwe Scholz; Falk Schreiber
Description
Supplementary material A.3 for the paper 'IDPredictor: predict database links in biomedical database'. Abstract: Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data are spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge excerpt out of the interlinked databases. A prerequisit for supporting the concept of an integrated data view is the to acquiring insights into cross-references among database entities. But only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predict and extracts cross-references from multiple life science databases and thier possible referenced data targets. We study the retrieval quality of our method and the relationship between manually crafted relevance ranking and relevance ranking based on cross-references, and report on first, promising results.
I
Molecular Biology Databases Published in Nucleic Acids Research between...
databank.illinois.edu
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Imker (2024). Molecular Biology Databases Published in Nucleic Acids Research between 1991-2016 [Dataset]. http://doi.org/10.13012/B2IDB-4311325_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-4311325_V1
Dataset updated
Feb 1, 2024
Authors
Heidi Imker
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was developed to create a census of sufficiently documented molecular biology databases to answer several preliminary research questions. Articles published in the annual Nucleic Acids Research (NAR) “Database Issues” were used to identify a population of databases for study. Namely, the questions addressed herein include: 1) what is the historical rate of database proliferation versus rate of database attrition?, 2) to what extent do citations indicate persistence?, and 3) are databases under active maintenance and does evidence of maintenance likewise correlate to citation? An overarching goal of this study is to provide the ability to identify subsets of databases for further analysis, both as presented within this study and through subsequent use of this openly released dataset.
u
Data from: MINT, the Molecular INTeraction database
mint.bio.uniroma2.it
tsv
Updated Feb 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Rome Tor Vergata, Bioinformatics and Computational Biology Unit (2018). MINT, the Molecular INTeraction database [Dataset]. https://mint.bio.uniroma2.it/
Explore at:
tsvAvailable download formats
Dataset updated
Feb 16, 2018
Dataset provided by
University of Rome Tor Vergata, Bioinformatics and Computational Biology Unit
IntAct Team
Authors
University of Rome Tor Vergata, Bioinformatics and Computational Biology Unit
Description
MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators
o
Supporting data for "Sequence Compression Benchmark (SCB) database"
explore.openaire.eu
Updated Jan 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kirill Kryukov; Mahoko, Takahashi Ueda; So Nakagawa; Tadashi Imanishi (2020). Supporting data for "Sequence Compression Benchmark (SCB) database" [Dataset]. http://doi.org/10.5524/100762
Explore at:
Unique identifier
https://doi.org/10.5524/100762
Dataset updated
Jan 1, 2020
Authors
Kirill Kryukov; Mahoko, Takahashi Ueda; So Nakagawa; Tadashi Imanishi
Description
Nearly all molecular sequence databases currently use gzip for data compression. Ongoing rapid accumulation of stored data calls for more efficient compression tools. Although numerous compressors exist, both specialized and general-purpose, choosing one of them was difficult because no comprehensive analysis of their comparative advantages for sequence compression was available.We systematically benchmarked 430 settings of 48 compressors (including 29 specialized sequence compressors and 19 general-purpose compressors) on representative FASTA-formatted datasets of DNA, RNA and protein sequences. Each compressor was evaluated on 17 performance measures, including compression strength, as well as time and memory required for compression and decompression. We used 27 test datasets including individual genomes of various sizes, DNA and RNA datasets, and standard protein datasets. We summarized the results as the Sequence Compression Benchmark database (SCB database) that allows building custom visualizations for selected subsets of benchmark results.We found that modern compressors offer a large improvement in compactness and speed compared to gzip. Our benchmark allows comparing compressors and their settings using a variety of performance measures, offering the opportunity to select the optimal compressor based on the data type and usage scenario specific to a particular application.
i
GCBN de.NBI User Training - PLANT 2030 Summer School - Basis Bioinformatics...
doi.ipk-gatersleben.de
Updated Oct 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uwe Scholz; Andrea Bräutigam; Martin Mascher; Matthias Lange; Yusheng Zhao; Uwe Scholz (2017). GCBN de.NBI User Training - PLANT 2030 Summer School - Basis Bioinformatics Training for Biologists [Dataset]. https://doi.ipk-gatersleben.de/DOI/966a00f1-1a75-470a-a2b8-195f34bcde3e/cdeec50e-3923-48da-8a03-365443002f79/6/
Explore at:
Dataset updated
Oct 5, 2017
Dataset provided by
e!DAL - Plant Genomics and Phenomics Research Data Repository (PGP), IPK Gatersleben, Seeland OT Gatersleben, Corrensstraße 3, 06466, Germany
Authors
Uwe Scholz; Andrea Bräutigam; Martin Mascher; Matthias Lange; Yusheng Zhao; Uwe Scholz
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The 4th German Crop BioGreenformatics Network (GCBN, https://www.denbi.de/gcbn) user training provided a hands-on introduction to useful bioinformatics tools for biologists with little or no previous knowledge. The training enabled biologists to process their own small and large datasets using R and Linux based methods and is entirely computer-based with interspersed lectures. The first part started with an introduction into the use and basic administration (software installation) of the Linux distribution Ubuntu and demonstrated the first steps into the use the R software (trainer Andrea Bräutigam, folder AB). In part two the use of Blast+ in the command line version, of simple Linux commands like 'cut', of Perl scripts and the graphical user interface of the phylogeny tool 'seaview' were demonstrated (trainer Uwe Scholz, folder US). The third session introduced basic concepts and practical tools for processing biological datasets in Linux. In particular, 'awk' and 'sed' were used. Moreover, 'SAMtools' and 'BEDTools' were applied (trainer Martin Mascher, folder MM). In the fourth part 'Introduction to Databases' a quick start guide to use relational databases was presented. By providing easy examples, this lesson set the fundamentals to motivate to use relational database systems as daily bioinformatics tool to store, retrieve and even analyze -omics data in the big data age (trainer Matthias Lange, folder ML). The last part introduced basic statistics to biologist, teach some commonly used statistical methods and demonstrated the creation of graphical visualizations with software R (trainer Yusheng Zhao, folder YZ).
Pharokka Databases
zenodo.org
application/gzip
Updated Jan 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Bouras; George Bouras (2023). Pharokka Databases [Dataset]. http://doi.org/10.5281/zenodo.7081772
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7081772
Dataset updated
Jan 24, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
George Bouras; George Bouras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is intended to hold the databases for Pharokka (https://github.com/gbouras13/pharokka)

It includes the PHROGs database, and mmseqs2 compatible versions of the CARD and VFDB databases.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gerda Cristal Villalba; Ursula Matte (2023). Fantastic databases and where to find them: Web applications for researchers in a rush [Dataset]. http://doi.org/10.6084/m9.figshare.20018091.v1

Fantastic databases and where to find them: Web applications for researchers in a rush

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20018091.v1

Dataset updated

Jun 3, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Gerda Cristal Villalba; Ursula Matte

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract Public databases are essential to the development of multi-omics resources. The amount of data created by biological technologies needs a systematic and organized form of storage, that can quickly be accessed, and managed. This is the objective of a biological database. Here, we present an overview of human databases with web applications. The databases and tools allow the search of biological sequences, genes and genomes, gene expression patterns, epigenetic variation, protein-protein interactions, variant frequency, regulatory elements, and comparative analysis between human and model organisms. Our goal is to provide an opportunity for exploring large datasets and analyzing the data for users with little or no programming skills. Public user-friendly web-based databases facilitate data mining and the search for information applicable to healthcare professionals. Besides, biological databases are essential to improve biomedical search sensitivity and efficiency and merge multiple datasets needed to share data and build global initiatives for the diagnosis, prognosis, and discovery of new treatments for genetic diseases. To show the databases at work, we present a a case study using ACE2 as example of a gene to be investigated. The analysis and the complete list of databases is available in the following website .

Clear search

Close search

Google apps

Main menu

Fantastic databases and where to find them: Web applications for researchers...

PROSITE profiles

Alternative Splicing Annotation Project II Database

Bio Resource for Array Genes Database

SFLD

DAVID

CATH-Gene3D

Funding and Operating Organizations for Long-Lived Molecular Biology...

Data_Sheet_1_riceExplorer: Uncovering the Hidden Potential of a National...

uniprot-database_(type_ko).27.09.2019.tab.rar

Benchmarks against protein structural databases.

ASURAT knowledge-based databases

Data from: Expasy

SMART

IDPredictor: predict database links in biomedical database. Supplementary...

Molecular Biology Databases Published in Nucleic Acids Research between...

Data from: MINT, the Molecular INTeraction database

Supporting data for "Sequence Compression Benchmark (SCB) database"

GCBN de.NBI User Training - PLANT 2030 Summer School - Basis Bioinformatics...

Pharokka Databases

Fantastic databases and where to find them: Web applications for researchers in a rush