100+ datasets found

e
CDD
ebi.ac.uk
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). CDD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Apr 18, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CDD is a protein annotation resource that consists of a collection of annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domain models, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases.
Bioinformatics Protein Dataset - Simulated
kaggle.com
zip
Updated Dec 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Gallo (2024). Bioinformatics Protein Dataset - Simulated [Dataset]. https://www.kaggle.com/datasets/gallo33henrique/bioinformatics-protein-dataset-simulated
Explore at:
zip(12928905 bytes)Available download formats
Dataset updated
Dec 27, 2024
Authors
Rafael Gallo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Subtitle

"Synthetic protein dataset with sequences, physical properties, and functional classification for machine learning tasks."

Description

Introduction

This synthetic dataset was created to explore and develop machine learning models in bioinformatics. It contains 20,000 synthetic proteins, each with an amino acid sequence, calculated physicochemical properties, and a functional classification.

Columns Included

ID_Protein: Unique identifier for each protein.

Sequence: String of amino acids.

Molecular_Weight: Molecular weight calculated from the sequence.

Isoelectric_Point: Estimated isoelectric point based on the sequence composition.

Hydrophobicity: Average hydrophobicity calculated from the sequence.

Total_Charge: Sum of the charges of the amino acids in the sequence.

Polar_Proportion: Percentage of polar amino acids in the sequence.

Nonpolar_Proportion: Percentage of nonpolar amino acids in the sequence.

Sequence_Length: Total number of amino acids in the sequence.

Class: The functional class of the protein, one of five categories: Enzyme, Transport, Structural, Receptor, Other.

Inspiration and Sources

While this is a simulated dataset, it was inspired by patterns observed in real protein datasets, such as: - UniProt: A comprehensive database of protein sequences and annotations. - Kyte-Doolittle Scale: Calculations of hydrophobicity. - Biopython: A tool for analyzing biological sequences.

Proposed Uses

This dataset is ideal for: - Training classification models for proteins. - Exploratory analysis of physicochemical properties of proteins. - Building machine learning pipelines in bioinformatics.

How This Dataset Was Created

Sequence Generation: Amino acid chains were randomly generated with lengths between 50 and 300 residues.

Property Calculation: Physicochemical properties were calculated using the Biopython library.

Class Assignment: Classes were randomly assigned for classification purposes.

Limitations

The sequences and properties do not represent real proteins but follow patterns observed in natural proteins.

The functional classes are simulated and do not correspond to actual biological characteristics.

Data Split

The dataset is divided into two subsets: - Training: 16,000 samples (proteinas_train.csv). - Testing: 4,000 samples (proteinas_test.csv).

Acknowledgment

This dataset was inspired by real bioinformatics challenges and designed to help researchers and developers explore machine learning applications in protein analysis.
e
NCBIFAM
ebi.ac.uk
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). NCBIFAM [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Aug 6, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
NCBIfam is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology. NCBIfam is maintained at the National Center for Biotechnology Information (Bethesda, MD). NCBIfam includes models from TIGRFAMs, another database of protein families developed at The Institute for Genomic Research, then at the J. Craig Venter Institute (Rockville, MD, US).
P
Protein Sequence Analysis Tool Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 5, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2026). Protein Sequence Analysis Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/protein-sequence-analysis-tool-1425541
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jan 5, 2026
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2026 - 2034
Area covered
Global
Variables measured
Market Size
Description
The size of the Protein Sequence Analysis Tool market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.
Structural Protein Sequences
kaggle.com
zip
Updated Feb 3, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHAHIR (2018). Structural Protein Sequences [Dataset]. https://www.kaggle.com/datasets/shahir/protein-data-set
Explore at:
zip(28782775 bytes)Available download formats
Dataset updated
Feb 3, 2018
Authors
SHAHIR
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

This is a protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).

The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.

Content

There are two data files. Both are arranged on "structureId" of the protein:

pdb_data_no_dups.csv contains protein meta data which includes details on protein classification, extraction methods, etc.

data_seq.csv contains >400,000 protein structure sequences.

Acknowledgements

Original data set down loaded from http://www.rcsb.org/pdb/

Inspiration

Protein data base helped the life science community to study about different diseases and come with new drugs and solution that help the human survival.
e
PROSITE profiles
ebi.ac.uk
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Feb 5, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.
Bacterial_conserved_protein_for_selection_test-protein.fa
figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Aug 15, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xuepeng Sun (2018). Bacterial_conserved_protein_for_selection_test-protein.fa [Dataset]. http://doi.org/10.6084/m9.figshare.6972455.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6972455.v1
Dataset updated
Aug 15, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Xuepeng Sun
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Protein sequences for the 36 conserved bacterial proteins subject for evolutionary analysis
P
Protein Sequence Analysis Tool Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jan 25, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2026). Protein Sequence Analysis Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/protein-sequence-analysis-tool-36100
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jan 25, 2026
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2026 - 2034
Area covered
Global
Variables measured
Market Size
Description
The size of the Protein Sequence Analysis Tool market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX % during the forecast period.
P
Protein Sequence Analysis Tool Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jan 5, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2026). Protein Sequence Analysis Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/protein-sequence-analysis-tool-1941839
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Jan 5, 2026
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2026 - 2034
Area covered
Global
Variables measured
Market Size
Description
The Protein Sequence Analysis Tool market is booming, projected to reach $7.8B by 2033 (CAGR 12%). This in-depth analysis explores market drivers, trends, restraints, and key players, including Waters Corp and Thermo Fisher. Discover insights into software, services, and regional market shares for biopharma, clinical diagnostics, and research.
e
PIRSF
ebi.ac.uk
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Apr 7, 2020
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.
e
Data from: PROSITE
prosite.expasy.org
toothandnail-mailorder.com
+7more
Updated Oct 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PROSITE [Dataset]. https://prosite.expasy.org/
Explore at:
Dataset updated
Oct 15, 2025
Description
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them [More... / References / Commercial users ]. PROSITE is complemented by ProRule , a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More...].
N
CDTree
datadiscovery.nlm.nih.gov
datahub.hhs.gov
+4more
csv, xlsx, xml
Updated Jun 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). CDTree [Dataset]. https://datadiscovery.nlm.nih.gov/NLM-Products-and-Services/CDTree/vkhf-hsp7
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Jun 30, 2021
Description
CDTree is a stand-alone application for classifying protein sequences and investigating their evolutionary relationships. CDTree can import, analyze and update existing Conserved Domain (CDD) records and hierarchies, and also allows users to create their own.
N
CD Search (Conserved Domain Search Service)
datadiscovery.nlm.nih.gov
odgavaprod.ogopendata.com
+4more
csv, xlsx, xml
Updated Jun 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). CD Search (Conserved Domain Search Service) [Dataset]. https://datadiscovery.nlm.nih.gov/Biology/CD-Search-Conserved-Domain-Search-Service-/j6ef-yjai
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Jun 30, 2021
Description
Identifies the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (Reverse Position-Specific BLAST) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD).
Mice Protein
kaggle.com
zip
Updated Dec 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammet Varlı (2020). Mice Protein [Dataset]. https://www.kaggle.com/datasets/muhammetvarl/mice-protein/data
Explore at:
zip(434808 bytes)Available download formats
Dataset updated
Dec 16, 2020
Authors
Muhammet Varlı
Description
Source: UCI - 2015 Please cite: Higuera C, Gardiner KJ, Cios KJ (2015) Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome. PLoS ONE 10(6): e0129126.

Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning.

The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex. There are 38 control mice and 34 trisomic mice (Down syndrome), for a total of 72 mice. In the experiments, 15 measurements were registered of each protein per sample/mouse. Therefore, for control mice, there are 38x15, or 570 measurements, and for trisomic mice, there are 34x15, or 510 measurements. The dataset contains a total of 1080 measurements per protein. Each measurement can be considered as an independent sample/mouse.

The eight classes of mice are described based on features such as genotype, behavior and treatment. According to genotype, mice can be control or trisomic. According to behavior, some mice have been stimulated to learn (context-shock) and others have not (shock-context) and in order to assess the effect of the drug memantine in recovering the ability to learn in trisomic mice, some mice have been injected with the drug and others have not.

Classes: * c-CS-s: control mice, stimulated to learn, injected with saline (9 mice) * c-CS-m: control mice, stimulated to learn, injected with memantine (10 mice) * c-SC-s: control mice, not stimulated to learn, injected with saline (9 mice) * c-SC-m: control mice, not stimulated to learn, injected with memantine (10 mice) * t-CS-s: trisomy mice, stimulated to learn, injected with saline (7 mice) * t-CS-m: trisomy mice, stimulated to learn, injected with memantine (9 mice) * t-SC-s: trisomy mice, not stimulated to learn, injected with saline (9 mice) * t-SC-m: trisomy mice, not stimulated to learn, injected with memantine (9 mice)

The aim is to identify subsets of proteins that are discriminant between the classes.

Attribute Information:

1 Mouse ID 2..78 Values of expression levels of 77 proteins; the names of proteins are followed by indicating that they were measured in the nuclear fraction. For example: DYRK1A_n 79 Genotype: control (c) or trisomy (t) 80 Treatment type: memantine (m) or saline (s) 81 Behavior: context-shock (CS) or shock-context (SC) 82 Class: c-CS-s, c-CS-m, c-SC-s, c-SC-m, t-CS-s, t-CS-m, t-SC-s, t-SC-m
Encoding of amino acids, deletions and missing protein sequence data after...
plos.figshare.com
xls
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luryane F. Souza; Hernane B. de B. Pereira; Tarcisio M. da Rocha Filho; Bruna A. S. Machado; Marcelo A. Moret (2023). Encoding of amino acids, deletions and missing protein sequence data after alignment. [Dataset]. http://doi.org/10.1371/journal.pone.0287880.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287880.t001
Dataset updated
Oct 5, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Luryane F. Souza; Hernane B. de B. Pereira; Tarcisio M. da Rocha Filho; Bruna A. S. Machado; Marcelo A. Moret
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code based on molecular structure of amino acid side chains by Chaudhuri et al. [18].
f
MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harald Marx; Simone Lemeer; Susan Klaeger; Thomas Rattei; Bernhard Kuster (2023). MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics [Dataset]. http://doi.org/10.1021/pr400215r.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/pr400215r.s003
Dataset updated
May 30, 2023
Dataset provided by
ACS Publications
Authors
Harald Marx; Simone Lemeer; Susan Klaeger; Thomas Rattei; Bernhard Kuster
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains ∼99% of the identifications in comparison to common databases despite a 3–48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.
Z
SARS-CoV-2 vs. Homo sapiens BLASTP protein sequence analysis results
data.niaid.nih.gov
zenodo.org
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arumugham, Vinu (2020). SARS-CoV-2 vs. Homo sapiens BLASTP protein sequence analysis results [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3758729
Explore at:
Dataset updated
Apr 21, 2020
Authors
Arumugham, Vinu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SARS-CoV-2 vs. Homo sapiens BLASTP protein sequence analysis results
f
Data_Sheet_1_Identification and Analysis of Long Repeats of Proteins at the...
figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Mary Rajathei; Subbiah Parthasarathy; Samuel Selvaraj (2023). Data_Sheet_1_Identification and Analysis of Long Repeats of Proteins at the Domain Level.xlsx [Dataset]. http://doi.org/10.3389/fbioe.2019.00250.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fbioe.2019.00250.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
David Mary Rajathei; Subbiah Parthasarathy; Samuel Selvaraj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Amino acid repeats play an important role in the structure and function of proteins. Analysis of long repeats in protein sequences enables one to understand their abundance, structure and function in the protein universe. In the present study, amino acid repeats of length >50 (long repeats) were identified in a non-redundant set of UniProt sequences using the RADAR program. The underlying structures and functions of these long repeats were carried out using the Gene3D for structural domains, Pfam for functional domains and enzyme and non-enzyme functional classification for catalytic and binding of the proteins. From a structural perspective, these long repeats seem to predominantly occur in certain architectures such as sandwich, bundle, barrel, and roll and within these architectures abundant in the superfolds. The lengths of the repeats within each fold are not uniform exhibiting different structures for different functions. We also observed that long repeats are in the domain regions of the family and are involved in the function of the proteins. After grouping based on enzyme and non-enzyme classes, we observed the abundant occurrence of long repeats in specific catalytic and binding of the proteins. In this study, we have analyzed the occurrence of long repeats in the protein sequence universe apart from well-characterized short tandem repeats in sequences and their structures and functions of the proteins at the domain level. The present study suggests that long repeats may play an important role in the structure and function of domains of the proteins.
e
SFLD
ebi.ac.uk
Updated Sep 7, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). SFLD [Dataset]. https://www.ebi.ac.uk/interpro/
Explore at:
Dataset updated
Sep 7, 2018
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SFLD (Structure-Function Linkage Database) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities.
Neurotransmitter Receptors & Protein Sequences
kaggle.com
zip
Updated Jul 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yashasvi Goswami (2025). Neurotransmitter Receptors & Protein Sequences [Dataset]. https://www.kaggle.com/datasets/yashasvigoswami/neurotransmitter-receptors
Explore at:
zip(35950 bytes)Available download formats
Dataset updated
Jul 7, 2025
Authors
Yashasvi Goswami
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset provides a curated collection of proteins with their sequences, functional annotations, and structural information. It serves as a resource for researchers working in bioinformatics, structural biology, and systems biology, offering insights into the molecular machinery of life.

By linking protein sequence data with functional and structural attributes, it supports diverse applications such as: -Protein classification and annotation tasks -Sequence-to-function machine learning models -Structural modeling and docking studies -Comparative proteomics and evolutionary studies

The dataset is valuable for both computational researchers and students of life sciences, offering real-world biological data that can be directly integrated into pipelines for protein analysis, modeling, and prediction.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). CDD [Dataset]. https://www.ebi.ac.uk/interpro/

CDD

Explore at:

Dataset updated

Apr 18, 2024

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

CDD is a protein annotation resource that consists of a collection of annotated multiple sequence alignment models for ancient domains and full-length proteins. These are available as position-specific score matrices (PSSMs) for fast identification of conserved domains in protein sequences via RPS-BLAST. CDD content includes NCBI-curated domain models, which use 3D-structure information to explicitly define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases.

Clear search

Close search

Google apps

Main menu

CDD

Bioinformatics Protein Dataset - Simulated

Subtitle

Description

Introduction

Columns Included

Inspiration and Sources

Proposed Uses

How This Dataset Was Created

Limitations

Data Split

Acknowledgment

NCBIFAM

Protein Sequence Analysis Tool Report

Structural Protein Sequences

Context

Content

Acknowledgements

Inspiration

PROSITE profiles

Bacterial_conserved_protein_for_selection_test-protein.fa

Protein Sequence Analysis Tool Report

Protein Sequence Analysis Tool Report

PIRSF

Data from: PROSITE

CDTree

CD Search (Conserved Domain Search Service)

Mice Protein

Attribute Information:

Encoding of amino acids, deletions and missing protein sequence data after...

MScDB: A Mass Spectrometry-centric Protein Sequence Database for Proteomics

SARS-CoV-2 vs. Homo sapiens BLASTP protein sequence analysis results

Data_Sheet_1_Identification and Analysis of Long Repeats of Proteins at the...

SFLD

Neurotransmitter Receptors & Protein Sequences

CDD