38 datasets found
  1. h

    uniref90

    • huggingface.co
    Updated Mar 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zach Nussbaum (2023). uniref90 [Dataset]. https://huggingface.co/datasets/zpn/uniref90
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2023
    Authors
    Zach Nussbaum
    Description

    zpn/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    uniref90

    • huggingface.co
    Updated Apr 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Elnaggar (2022). uniref90 [Dataset]. https://huggingface.co/datasets/agemagician/uniref90
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2022
    Authors
    Ahmed Elnaggar
    Description

    agemagician/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. n

    UniRef

    • neuinfo.org
    • rrid.site
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UniRef [Dataset]. http://identifiers.org/RRID:SCR_010646
    Explore at:
    Description

    Databases which provide clustered sets of sequences from UniProt Knowledgebase and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view. The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries, and links to the corresponding UniProtKB and UniParc records are all displayed in the entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% (UniRef90) or 50% (UniRef50) sequence identity to the longest sequence (UniRef seed sequence). All the sequences in each cluster are ranked to facilitate the selection of a representative sequence for the cluster.

  4. n

    UniRef at the EBI

    • neuinfo.org
    • scicrunch.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniRef at the EBI [Dataset]. http://identifiers.org/RRID:SCR_004972
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Various non-redundant databases with different sequence identity cut-offs created by clustering closely similar sequences to yield a representative subset of sequences. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has >90% or >50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases. The two major objectives of UniRef are: (i) to facilitate sequence merging in UniProt, and (ii) to allow faster and more informative sequence similarity searches. Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view. A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. You may access NREF via the FTP server.

  5. u

    CAT/BAT uniref90+algae proteins from NCBI

    • figshare.unimelb.edu.au
    bin
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuhao Tong (2025). CAT/BAT uniref90+algae proteins from NCBI [Dataset]. http://doi.org/10.26188/27990278.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 25, 2025
    Dataset provided by
    The University of Melbourne
    Authors
    Yuhao Tong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the packaged CAT/BAT database, storing all amino acid sequences from Uniref90 as well as ~440,000 algal chloroplast sequences from NCBI nucleotide database. Before running ChloroScan, please download this package, unzip it and pass the tax/ and db/ within the directory as part of parameters.

  6. esm2_uniref_pretraining_data

    • huggingface.co
    Updated Feb 4, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2026). esm2_uniref_pretraining_data [Dataset]. https://huggingface.co/datasets/nvidia/esm2_uniref_pretraining_data
    Explore at:
    Dataset updated
    Feb 4, 2026
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ESM-2 Uniref Pretraining Data

      Dataset Description:
    

    UniRef, or UniProt Reference Clusters, are databases of clustered protein sequences from the UniProt Knowledgebase (UniProtKB) that group similar sequences to reduce redundancy and make data easier to work with for biological research. It offers different levels of clustering (UniRef100, UniRef90, and UniRef50) based on sequence identity, with each cluster containing a representative sequence, a count of member proteinsโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/nvidia/esm2_uniref_pretraining_data.

  7. uniref90

    • kaggle.com
    zip
    Updated Nov 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen Fan (2023). uniref90 [Dataset]. https://www.kaggle.com/datasets/zhfanrui/uniref90
    Explore at:
    zip(42899039362 bytes)Available download formats
    Dataset updated
    Nov 5, 2023
    Authors
    Stephen Fan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Stephen Fan

    Released under CC0: Public Domain

    Contents

    20231104

  8. Multi-Kingdom UniRef Protein Sequences

    • kaggle.com
    zip
    Updated Dec 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullateef TIJANI (2025). Multi-Kingdom UniRef Protein Sequences [Dataset]. https://www.kaggle.com/datasets/tijaniabdullateef/uniref-protein-sequences-for-dl-pretraining
    Explore at:
    zip(1022430287 bytes)Available download formats
    Dataset updated
    Dec 31, 2025
    Authors
    Abdullateef TIJANI
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains 5,661,294 protein sequences combined from UniRef (UniRef50 and UniRef90) and curated for self-supervised pretraining of deep learning models on protein sequences.

    The data spans six biological groups: Bacteria, Archaea, Fungi, Plants, Arthropods, and Mammals. Sequences were filtered by sequence identity, taxonomy, and length to reduce redundancy while maintaining biological diversity.

    The dataset includes:

    combined_pretrain.fasta = All protein sequences (โ‰ˆ1.05 GB)

    pretrain_metadata.csv = Metadata for each sequence (โ‰ˆ810 MB), including organism group and sequence statistics

    This dataset is intended for representation learning / pretraining, before fine-tuning on labeled protein function data (e.g., Gene Ontology annotations).

  9. h

    UniRef90_len_0_50

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenjiao Du, UniRef90_len_0_50 [Dataset]. https://huggingface.co/datasets/dzjxzyd/UniRef90_len_0_50
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Zhenjiao Du
    Description

    This is a dataset download from UniRef90 database with sequence length ranging from 0 to 50

    codes for the data mining (downloaded on September 30 2024) import requests query_url = 'https://rest.uniprot.org/uniref/stream?compressed=true&fields=id%2Clength%2Cidentity%2Csequence&format=tsv&query=%28%28length%3A%5B*+TO+50%5D%29%29+AND+%28identity%3A0.9%29' uniprot_request = requests.get(query_url) from io import BytesIO import pandas

    bio = BytesIO(uniprot_request.content)

    df =โ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/dzjxzyd/UniRef90_len_0_50.

  10. UniRef Six-Kingdom Pretraining Dataset

    • kaggle.com
    zip
    Updated Dec 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullateef TIJANI (2025). UniRef Six-Kingdom Pretraining Dataset [Dataset]. https://www.kaggle.com/datasets/tijaniabdullateef/uniref-pretraining
    Explore at:
    zip(699739527 bytes)Available download formats
    Dataset updated
    Dec 31, 2025
    Authors
    Abdullateef TIJANI
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset contains protein sequences from six biological kingdoms, extracted from UniRef (UniRef50 / UniRef90) and curated for self-supervised pretraining of deep learning models.

    All sequences are unreviewed (UniRef clusters) and intended for self-supervised pretraining, not direct functional supervision.

    Files can be used individually or merged depending on compute and model size.

    Suitable for transformer-based protein language models (e.g., ESM-style architectures).

  11. Number of protein sequences in UniRef100 database and its variants.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar (2023). Number of protein sequences in UniRef100 database and its variants. [Dataset]. http://doi.org/10.1371/journal.pone.0158445.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of protein sequences in UniRef100 database and its variants.

  12. n

    TIGR Plant Transcript Assembly database

    • neuinfo.org
    • dknet.org
    • +2more
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). TIGR Plant Transcript Assembly database [Dataset]. http://identifiers.org/RRID:SCR_005470
    Explore at:
    Dataset updated
    Jun 20, 2024
    Description

    The TIGR database is a collection of plant transcript sequences. Transcript assemblies are searchable using BLAST and accession number. The construction of plant transcript assemblies (TAs) is similar to the TIGR gene indices. The sequences that are used to build the plant TAs are expressed transcripts collected from dbEST (ESTs) and the NCBI GenBank nucleotide database (full length and partial cDNAs). "Virtual" transcript sequences derived from whole genome annotation projects are not included. All plant species for which more than 1,000 ESTs or cDNA sequences are available are included in this project. TAs are clustered and assembled using the TGICL tool (Pertea et al., 2003), Megablast (Zhang et al., 2000) and the CAP3 assembler (Huang and Madan, 1999). TGICL is a wrapper script which invokes Megablast and CAP3. Sequences are initially clustered based on an all-against-all comparisons using Megablast. The initial clusters are assembled to generate consensus sequences using CAP3. Assembly criteria include a 50 bp minimum match, 95% minimum identity in the overlap region and 20 bp maximum unmatched overhangs. Any EST/cDNA sequences that are not assembled into TAs are included as singletons. All singletons retain their GenBank accession numbers as identifiers. Plant TA identifiers are of the form TAnumber_taxonID, where number is a unique numerical identifier of the transcript assembly and taxonID represents the NCBI taxon id. In order to provide annotation for the TAs, each TA/singleton was aligned to the UniProt Uniref database. For release 1 TAs, a masked version of the Uniref90 database was used. For release 2 and onwards, a masked version of the UniRef100 database is used. Alignments were required to have at least 20% identity and 20% coverage. The annotation for the protein with the best alignment to each TA or singleton was used as the annotation for that sequence. Additionally, the relative orientation of each TA/singleton to the best matching protein sequence was used to determine the orientation of each TA/singleton. Some sequences did not have alignments to the protein database that met our quality criteria, and those sequences have neither annotation nor orientation assignments. The release number for the plant TAs refers to the release version for a particular species. For the initial build, all TA sets are of version 1. Subsequent TA updates for new releases will be carried out when the percentage increase of the EST and cDNA counts exceeds 10% of the previous release and when the increase contains more than 1,000 new sequences. New releases will also include additional plant species with more than 1,000 EST or cDNA sequences that have become publicly available.

  13. CAMI2_FunctionalAnnotation_BenchmarkSet

    • zenodo.org
    bin
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Turck; Jonathan Turck (2025). CAMI2_FunctionalAnnotation_BenchmarkSet [Dataset]. http://doi.org/10.5281/zenodo.15192200
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonathan Turck; Jonathan Turck
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Paired protein sequences in FASTA format and enzyme commission (EC) labels generated from the CAMI 2 Toy Human Microbiome Project gold standard assemblies using Prodigal and DIAMOND to the UniRef90 database.

  14. d

    ProteinBERT Trained model

    • dataone.org
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofer, Dan; Brandes, Nadav (2023). ProteinBERT Trained model [Dataset]. http://doi.org/10.7910/DVN/HI55J5
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ofer, Dan; Brandes, Nadav
    Description

    Trained ProteinBERT model weights for the paper "ProteinBERT: A universal deep-learning model of protein sequence and function". https://github.com/nadavbra/protein_bert Also available via FTP: ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl ProteinBERT is a protein language model pretrained on ~106M proteins from UniRef90. The pretrained model can be fine-tuned on any protein-related task in a matter of minutes. ProteinBERT achieves state-of-the-art performance on a wide range of benchmarks. ProteinBERT is built on Keras/TensorFlow. ProteinBERT's deep-learning architecture is inspired by BERT, but contains several innovations such as global-attention layers that have linear complexity for sequence length (compared to self-attention's quadratic/n^2 growth). As a result, the model can process protein sequences of almost any length, including extremely long protein sequences (of over tens of thousands of amino acids). The model takes protein sequences as inputs, and can also take protein GO annotations as additional inputs (to help the model infer about the function of the input protein and update its internal representations and outputs accordingly). This pretrained Tensorflow/Keras model was produced by training for 28 days over ~670M records (~6.4 epochs over the entire UniRef90 training dataset of ~106M proteins).

  15. Average number of hits used for generating PSSM profiles.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar (2023). Average number of hits used for generating PSSM profiles. [Dataset]. http://doi.org/10.1371/journal.pone.0158445.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average number of hits used for generating PSSM profiles.

  16. f

    Evolutionary aspects of functionally relevant homodimers exhibiting global...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 22, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Srikeerthana, Kuchi; Srinivasan, Narayanaswamy; Swapna, Lakshmipuram Seshadri (2012). Evolutionary aspects of functionally relevant homodimers exhibiting global asymmetry. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001143814
    Explore at:
    Dataset updated
    May 22, 2012
    Authors
    Srikeerthana, Kuchi; Srinivasan, Narayanaswamy; Swapna, Lakshmipuram Seshadri
    Description

    Note: Unless indicated by * all homologous sequences have been gathered from Uniref50 database. If very few homologues are identified then homologues identified from Uniref90 database (indicated by *) are used in the analysis. In a few PDB entries, several molecules are present. The dimeric molecule under consideration is highlighted using italics.

  17. h

    uniref

    • huggingface.co
    Updated Jan 1, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tonyni (2026). uniref [Dataset]. https://huggingface.co/datasets/tonynzh/uniref
    Explore at:
    Dataset updated
    Jan 1, 2026
    Authors
    tonyni
    Description

    tonynzh/uniref dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. Additional file 2: of Understanding the microbial basis of body odor in...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tze Lam; Davide Verzotto; Purbita Brahma; Amanda Ng; Ping Hu; Dan Schnell; Jay Tiesman; Rong Kong; Thi Ton; Jianjun Li; May Ong; Yang Lu; David Swaile; Ping Liu; Jiquan Liu; Niranjan Nagarajan (2024). Additional file 2: of Understanding the microbial basis of body odor in pre-pubescent children and teenagers [Dataset]. http://doi.org/10.6084/m9.figshare.7404806.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Tze Lam; Davide Verzotto; Purbita Brahma; Amanda Ng; Ping Hu; Dan Schnell; Jay Tiesman; Rong Kong; Thi Ton; Jianjun Li; May Ong; Yang Lu; David Swaile; Ping Liu; Jiquan Liu; Niranjan Nagarajan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Table S2. Odor assessment and metagenomic sequencing data for all samples in this study. Table S5. GC-olfactometry of pooled sweat collected from children and teenagers. Table S6. Detected MetaCyc pathways that were found to be associated with malodor based on pathway abudance values from HUMAnN2. No significant association were detected for the head region. Table S7. Key pathways associated wth malodor production and their taxonomic contributors. Table S8. Information on how samples were distributed in library preparation and sequencing batches to avoid batch effects. Table S9. Reads mapped (%) to UniRef90 gene families and MetaCyc pathways. (ZIP 92 kb)

  19. uniref example

    • kaggle.com
    zip
    Updated Feb 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    team93 (2025). uniref example [Dataset]. https://www.kaggle.com/datasets/team93/uniref-example
    Explore at:
    zip(2805668 bytes)Available download formats
    Dataset updated
    Feb 23, 2025
    Authors
    team93
    Description

    Dataset

    This dataset was created by team93

    Contents

  20. o

    Annotation Table - Whole Body Transcriptomes Of The Tick Ixodes Ricinus At...

    • explore.openaire.eu
    Updated Aug 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    N Pierre Charrier; Marjorie Couton; Maarten J Voordouw; Olivier Rais; Axelle Durand-Hermouet; Caroline Hervet; Olivier Plantard; Claude Rispe (2017). Annotation Table - Whole Body Transcriptomes Of The Tick Ixodes Ricinus At Different Stage And Feeding Conditions [Dataset]. http://doi.org/10.5281/zenodo.1137702
    Explore at:
    Dataset updated
    Aug 1, 2017
    Authors
    N Pierre Charrier; Marjorie Couton; Maarten J Voordouw; Olivier Rais; Axelle Durand-Hermouet; Caroline Hervet; Olivier Plantard; Claude Rispe
    Description

    Annotation table for a de novo assembled transcriptome of Ixodes ricinus in different stages and conditions. Description of the fields of each column (Trinotate results, and additionnal statistics): 1. Contig_name: name of the contig (Trinity assembly) 2. sprot_Top_BLASTX_hit: first hit of the blastx search against SwissProt (https://data.broadinstitute.org/Trinity/Trinotate v2.0 RESOURCES/) 3. TrEMBL_Top_BLASTX_hit: first hit of the blastx search against Uniref90 (https://data.broadinstitute.org/Trinity/Trinotate v2.0 RESOURCES/) 4. RNAMMER: identification of non-coding RNAs 5. prot_id: identifier of the predicted protein (TransDecoder) 6. prot_coords: coordinates (start, end and strand) of the predicted protein on the contig 7. sprot_Top_BLASTP_hit: first hit of the blastp search between the predicted protein and SwissProt (https://data.broadinstitute.org/Trinity/Trinotate v2.0 RESOURCES/) 8. TrEMBL_Top_BLASTP_hit: first hit of the blastp search between the predicted protein and Uniref90 (https://data.broadinstitute.org/Trinity/Trinotate v2.0 RESOURCES/) 9. Pfam: result of the search against PfamA database 10. SignalP: prediction of a signal peptide with SignalP 11. TmHMM: prediction of a transmembrane domain with THMM 12. eggnog: eggNOG database of orthologous genes (v3.0) assignation 13. gene_ontology_blast: GO assignation based on blast results 14. gene_ontology_pfam: GO assignation based on pfam results 15. Contig_length: length of the contig in bp 16. Busco_Id: name of the BUSCO (v1) 17. Busco_status: status of the BUSCO (complete/fragmented/duplicated) 18-32: Kallisto read counts for the 15 libraries A, B, C: unfed nymphs (replicates 1, 2, 3) D, E, F: partially fed nymphs (replicates 1, 2, 3) G, H, I: males (unfed) (replicates 1, 2, 3) J, K, L: unfed adult females (replicates 1, 2, 3) M, N, O: partially fed adult females (replicates 1, 2, 3) 33. log2FoldChange_UnfedVsPartiallyFed: log fold change in base 2 of expression (comparison between "unfed" -including males- and "fed" ticks) 34. pvalue_UnfedVsPartiallyFed: p-value of the comparison between "unfed" -including males- and "fed" ticks 35. log2FoldChange_MaleVsFemale: log fold change in base 2 of expression (comparison between "males" and "females") 36. pvalue_MaleVsFemale: p-value of the comparison between "males" and "females" 37. log2FoldChange_NymphsVsAdults: log fold change in base 2 of expression (comparison between "nymphs" and "adults" -males and females-) 38. pvalue_NymphsVsAdults: p-value of the comparison between "nymphs" and "adults" -males and females-) Fixed column 8 (blastp result on TrEMBL)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zach Nussbaum (2023). uniref90 [Dataset]. https://huggingface.co/datasets/zpn/uniref90

uniref90

zpn/uniref90

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Authors
Zach Nussbaum
Description

zpn/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu