Search
Clear search
Close search
Main menu
Google apps
40 datasets found
  1. h

    uniref90

    • huggingface.co
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zach Nussbaum (2023). uniref90 [Dataset]. https://huggingface.co/datasets/zpn/uniref90
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2023
    Authors
    Zach Nussbaum
    Description

    zpn/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    uniref90

    • huggingface.co
    Updated Apr 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Elnaggar (2022). uniref90 [Dataset]. https://huggingface.co/datasets/agemagician/uniref90
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2022
    Authors
    Ahmed Elnaggar
    Description

    agemagician/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. s

    UniRef

    • scicrunch.org
    • dknet.org
    • +1more
    Updated Mar 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniRef [Dataset]. http://identifiers.org/RRID:nlx_66133
    Explore at:
    Dataset updated
    Mar 22, 2025
    Description

    Databases which provide clustered sets of sequences from UniProt Knowledgebase and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences from view. The UniRef100 database combines identical sequences and sub-fragments with 11 or more residues (from any organism) into a single UniRef entry. The sequence of a representative protein, the accession numbers of all the merged entries, and links to the corresponding UniProtKB and UniParc records are all displayed in the entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences with 11 or more residues such that each cluster is composed of sequences that have at least 90% (UniRef90) or 50% (UniRef50) sequence identity to the longest sequence (UniRef seed sequence). All the sequences in each cluster are ranked to facilitate the selection of a representative sequence for the cluster.

  4. b

    UniRef

    • bioregistry.io
    Updated Apr 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). UniRef [Dataset]. http://identifiers.org/re3data:r3d100011518
    Explore at:
    Dataset updated
    Apr 9, 2022
    Description

    The UniProt Reference Clusters (UniRef) provide clustered sets of sequences from the UniProt Knowledgebase (including isoforms) and selected UniParc records in order to obtain complete coverage of the sequence space at several resolutions while hiding redundant sequences (but not their descriptions) from view.

  5. d

    UniRef at the EBI

    • dknet.org
    • scicrunch.org
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). UniRef at the EBI [Dataset]. http://identifiers.org/RRID:SCR_004972
    Explore at:
    Dataset updated
    Jan 21, 2025
    Description

    Various non-redundant databases with different sequence identity cut-offs created by clustering closely similar sequences to yield a representative subset of sequences. In the UniRef90 and UniRef50 databases no pair of sequences in the representative set has >90% or >50% mutual sequence identity. The UniRef100 database presents identical sequences and sub-fragments as a single entry with protein IDs, sequences, bibliography, and links to protein databases. The two major objectives of UniRef are: (i) to facilitate sequence merging in UniProt, and (ii) to allow faster and more informative sequence similarity searches. Although the UniProt Knowledgebase is much less redundant than UniParc, it still contains a certain level of redundancy because it is not possible to use fully automatic merging without risking merging of similar sequences from different proteins. However, such automatic procedures are extremely useful in compiling the UniRef databases to obtain complete coverage of sequence space while hiding redundant sequences (but not their descriptions) from view. A high level of redundancy results in several problems, including slow database searches and long lists of similar or identical alignments that can obscure novel matches in the output. Thus, a more even sampling of sequence space is advantageous. You may access NREF via the FTP server.

  6. List of UniRef90 clusters that include mammals and dsDNA viruses (Class I).

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nadav Rappoport; Michal Linial (2023). List of UniRef90 clusters that include mammals and dsDNA viruses (Class I). [Dataset]. http://doi.org/10.1371/journal.pcbi.1002364.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nadav Rappoport; Michal Linial
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aBac, Cluster is mixed with bacterial proteins.bLength of cluster's seed protein.cAnalysis is based on phylogenetic tree and analyzing the expanded cluster according to UniRef50.dH2V, from host to virus. I.e., sequences acquired by the virus from a metazoan host. N.D. Unresolved; Cont, contamination; Frag, Fragment.

  7. UniProt UniRef90

    • kaggle.com
    zip
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darien Schettler (2022). UniProt UniRef90 [Dataset]. https://www.kaggle.com/datasets/dschettler8845/uniprot-uniref90
    Explore at:
    zip(36601227955 bytes)Available download formats
    Dataset updated
    Oct 4, 2022
    Authors
    Darien Schettler
    Description

    Dataset

    This dataset was created by Darien Schettler

    Contents

  8. h

    UniRef90-GPCR-Proteins

    • huggingface.co
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michel NIvard (2025). UniRef90-GPCR-Proteins [Dataset]. https://huggingface.co/datasets/MichelNivard/UniRef90-GPCR-Proteins
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 14, 2025
    Authors
    Michel NIvard
    Description

    All UniRef90 sequences of G-protein coupled receptors (GPCR) class proteins across all species. G-protein coupled receptors are evolutionarily related proteins and cell surface receptors that detect molecules outside the cell in Eukariotes. Contains both confirmed and putative proteins.

  9. u

    CAT/BAT uniref90+algae proteins from NCBI

    • figshare.unimelb.edu.au
    bin
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuhao Tong (2024). CAT/BAT uniref90+algae proteins from NCBI [Dataset]. http://doi.org/10.26188/27990278.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    The University of Melbourne
    Authors
    Yuhao Tong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    description

  10. h

    UniRef50_len_0_50

    • huggingface.co
    Updated Jun 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenjiao Du (2023). UniRef50_len_0_50 [Dataset]. https://huggingface.co/datasets/dzjxzyd/UniRef50_len_0_50
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 20, 2023
    Authors
    Zhenjiao Du
    Description

    This is a dataset download from UniRef50 database with sequence length ranging from 0 to 50

    codes for the data mining (downloaded on September 30 2024) import requests query_url = 'https://rest.uniprot.org/uniref/stream?compressed=true&fields=id%2Clength%2Cidentity%2Csequence&format=tsv&query=%28%28length%3A%5B*+TO+50%5D%29%29+AND+%28identity%3A0.5%29' uniprot_request = requests.get(query_url) from io import BytesIO import pandas

    bio = BytesIO(uniprot_request.content)

    df =… See the full description on the dataset page: https://huggingface.co/datasets/dzjxzyd/UniRef50_len_0_50.

  11. r

    TIGR Plant Transcript Assembly database

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). TIGR Plant Transcript Assembly database [Dataset]. http://identifiers.org/RRID:SCR_005470
    Explore at:
    Dataset updated
    Mar 9, 2025
    Description

    The TIGR database is a collection of plant transcript sequences. Transcript assemblies are searchable using BLAST and accession number. The construction of plant transcript assemblies (TAs) is similar to the TIGR gene indices. The sequences that are used to build the plant TAs are expressed transcripts collected from dbEST (ESTs) and the NCBI GenBank nucleotide database (full length and partial cDNAs). "Virtual" transcript sequences derived from whole genome annotation projects are not included. All plant species for which more than 1,000 ESTs or cDNA sequences are available are included in this project. TAs are clustered and assembled using the TGICL tool (Pertea et al., 2003), Megablast (Zhang et al., 2000) and the CAP3 assembler (Huang and Madan, 1999). TGICL is a wrapper script which invokes Megablast and CAP3. Sequences are initially clustered based on an all-against-all comparisons using Megablast. The initial clusters are assembled to generate consensus sequences using CAP3. Assembly criteria include a 50 bp minimum match, 95% minimum identity in the overlap region and 20 bp maximum unmatched overhangs. Any EST/cDNA sequences that are not assembled into TAs are included as singletons. All singletons retain their GenBank accession numbers as identifiers. Plant TA identifiers are of the form TAnumber_taxonID, where number is a unique numerical identifier of the transcript assembly and taxonID represents the NCBI taxon id. In order to provide annotation for the TAs, each TA/singleton was aligned to the UniProt Uniref database. For release 1 TAs, a masked version of the Uniref90 database was used. For release 2 and onwards, a masked version of the UniRef100 database is used. Alignments were required to have at least 20% identity and 20% coverage. The annotation for the protein with the best alignment to each TA or singleton was used as the annotation for that sequence. Additionally, the relative orientation of each TA/singleton to the best matching protein sequence was used to determine the orientation of each TA/singleton. Some sequences did not have alignments to the protein database that met our quality criteria, and those sequences have neither annotation nor orientation assignments. The release number for the plant TAs refers to the release version for a particular species. For the initial build, all TA sets are of version 1. Subsequent TA updates for new releases will be carried out when the percentage increase of the EST and cDNA counts exceeds 10% of the previous release and when the increase contains more than 1,000 new sequences. New releases will also include additional plant species with more than 1,000 EST or cDNA sequences that have become publicly available.

  12. d

    ProteinBERT Trained model

    • dataone.org
    • dataverse.harvard.edu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofer, Dan; Brandes, Nadav (2023). ProteinBERT Trained model [Dataset]. https://dataone.org/datasets/sha256%3A68633f44e5f1922f727066014055bd3a0afc42596d0113cf3d039c7dc191e49f
    Explore at:
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ofer, Dan; Brandes, Nadav
    Description

    Trained ProteinBERT model weights for the paper "ProteinBERT: A universal deep-learning model of protein sequence and function". https://github.com/nadavbra/protein_bert Also available via FTP: ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl ProteinBERT is a protein language model pretrained on ~106M proteins from UniRef90. The pretrained model can be fine-tuned on any protein-related task in a matter of minutes. ProteinBERT achieves state-of-the-art performance on a wide range of benchmarks. ProteinBERT is built on Keras/TensorFlow. ProteinBERT's deep-learning architecture is inspired by BERT, but contains several innovations such as global-attention layers that have linear complexity for sequence length (compared to self-attention's quadratic/n^2 growth). As a result, the model can process protein sequences of almost any length, including extremely long protein sequences (of over tens of thousands of amino acids). The model takes protein sequences as inputs, and can also take protein GO annotations as additional inputs (to help the model infer about the function of the input protein and update its internal representations and outputs accordingly). This pretrained Tensorflow/Keras model was produced by training for 28 days over ~670M records (~6.4 epochs over the entire UniRef90 training dataset of ~106M proteins).

  13. Platon RDS dataset

    • zenodo.org
    application/gzip, tsv
    Updated Apr 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Schwengers; Oliver Schwengers; Patrick Barth; Linda Falgenhauer; Torsten Hain; Trinad Chakraborty; Alexander Goesmann; Patrick Barth; Linda Falgenhauer; Torsten Hain; Trinad Chakraborty; Alexander Goesmann (2020). Platon RDS dataset [Dataset]. http://doi.org/10.5281/zenodo.3759169
    Explore at:
    application/gzip, tsvAvailable download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Oliver Schwengers; Oliver Schwengers; Patrick Barth; Linda Falgenhauer; Torsten Hain; Trinad Chakraborty; Alexander Goesmann; Patrick Barth; Linda Falgenhauer; Torsten Hain; Trinad Chakraborty; Alexander Goesmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset was used in the Platon manuscript/publication comprising:

    • chromosome sequences
    • plasmid sequences
    • UniRef90 bacterial representative protein sequences
    • UniRef90 protein / chromosome & plasmid hit counts
    • artificial contigs
    • RDS threhsold metrics
  14. UniProt

    • registry.opendata.aws
    Updated Apr 6, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIB Swiss Institute of Bioinformatics on behalf of the UniProt Consortium (2021). UniProt [Dataset]. https://registry.opendata.aws/uniprot/
    Explore at:
    Dataset updated
    Apr 6, 2021
    Dataset provided by
    UniProthttp://www.uniprot.org/
    Description

    The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host institutions EMBL-EBI, SIB Swiss Institute of Bioinformatics and PIR are committed to the long-term preservation of the UniProt databases.

  15. f

    Average number of hits used for generating PSSM profiles.

    • plos.figshare.com
    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar (2023). Average number of hits used for generating PSSM profiles. [Dataset]. http://doi.org/10.1371/journal.pone.0158445.t007
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Average number of hits used for generating PSSM profiles.

  16. Z

    Metaclusters by DPCfam clustering of UniRef50 v 2017_07

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elena Tea Russo (2022). Metaclusters by DPCfam clustering of UniRef50 v 2017_07 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5877585
    Explore at:
    Dataset updated
    Oct 30, 2022
    Dataset provided by
    Elena Tea Russo
    Federico Barone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Metaclusters obtained from the DPCfam clustering of UniRef50, v. 2017_07. Metaclusters represent putative protein families automatically derived using the DPCfam method, as described in Unsupervised protein family classification by Density Peak clustering, Russo ET, 2020, PhD Thesis http://hdl.handle.net/20.500.11767/116345 . Supervisors: Alessandro Laio, Marco Punta.

    Visit also https://dpcfam.areasciencepark.it/ to easily navigate the data.

    VERSION 1.1 changes:

    Added DPCfamB database, including all small metaclusters with 25<=N<50 seed sequences. DPCdamB files are named with the prefix B_

    Added Alphafold representative based on AlphaFoldDB for each MC

    FILES DESCRIPTION:

    1) Standard DPCfam database

    metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included.

    metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .

    metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported .

    all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm. A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) more than 50 elements and 2) average length larger than 50 a.a.s are reported

    uniref50_annotated.xml.gz UniRef50 v.2017_07 database annotated with Pfam families and DPCfam metaclusters. A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included. XML schema is derived from uniprot's UniRef50 xml schema.

    2) DPCfamB database

    B_metaclusters_xml.tar.gz Metaclusters' seeds, unaligned in an xml table. All metaclusters are listed. Metaclusters entries include also some statistical information about each MC (such as size, average length, low complexity fraction etc, ) and Pfam comparison (Dominant Architecture). A README file is included describing the data. A parser is included to transform XML data to space-separated tables. XML schema is included.

    B_metaclusters_msas.tar.gz Metsclusters' multiple sequence alignments, in fasta format. Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported .

    B_metaclusters_hmms.tar.gz Metsclusters' profile-hmms. A ".hmm" file for each metacluser. Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported .

    B_ all_metaclusters_hmm.tar.gz Collctive metaclusters' profile-hmm. A single .hmm file collecting all MC's profile-hmm. . Only MCs with seeds with 1) 25<=N<50 elements and 2) average length larger than 50 a.a.s are reported

  17. Performance comparison using independent tests.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar (2023). Performance comparison using independent tests. [Dataset]. http://doi.org/10.1371/journal.pone.0158445.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yasser EL-Manzalawy; Mostafa Abbas; Qutaibah Malluhi; Vasant Honavar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison using independent tests.

  18. m

    Single-end transcriptome outputs from the forward unpaired reads of Savalia...

    • data.mendeley.com
    Updated Oct 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dany Domínguez Pérez (2024). Single-end transcriptome outputs from the forward unpaired reads of Savalia savaglia RNAseq analyses [Dataset]. http://doi.org/10.17632/pmxwfjyyvy.2
    Explore at:
    Dataset updated
    Oct 10, 2024
    Authors
    Dany Domínguez Pérez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains original single-end transcriptome outputs from the forward broken paired-end RNAseq of the false black coral Savalia savaglia (NCBI/BioProject accession: PRJNA1111802). The dataset includes the following files:

    • Assembly_Ss_SE_Trinity.fastq.U.qtrim: Remaining unpaired reads from the forward broken paired-end RNAseq of Savalia savaglia.

    • Assembly_Ss_SE.Trinity.fasta: Single-end transcriptome obtained with the Trinity Assembler from the forward broken paired-end RNAseq of Savalia savaglia.

    • Assembly_Ss_SE_Trinity.fasta_stats.txt: Statistics summary of the single-end transcriptome of Savalia savaglia.

    • Assembly_Ss_SE_Trinity.fasta.gene_trans_map: Transcript-to-gene mapping file generated during the assembly process.

    • quant.sf: Salmon output containing expression values for the assembled transcripts.

    Additionally, the dataset includes the following files from DIAMOND BLASTx analyses:

    • UniRef90_SE.diamond.blastx.outfmt6: BLASTx output file against the UniRef90 database, reporting the top alignment for each query.

    • UniRef90_SE.diamond.blastx.outfmt6.grouped: BLASTx hits grouped to improve sequence coverage by combining multiple high-scoring segment pairs (HSPs).

    • UniRef90_SE.diamond.blastx.outfmt6.hist: Histogram summarizing the distribution of BLASTx hit lengths.

    • UniRef90_SE.diamond.blastx.outfmt6.w_pct_hit_length: File providing percentages of hit lengths, including top hit's length and percent of the length covered in the alignment.

  19. m

    Original paired-end transcriptome outputs from the RNAseq analyses of the...

    • data.mendeley.com
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dany Domínguez Pérez (2024). Original paired-end transcriptome outputs from the RNAseq analyses of the false black coral Savalia savaglia [Dataset]. http://doi.org/10.17632/7t36p2dvjp.2
    Explore at:
    Dataset updated
    Oct 10, 2024
    Authors
    Dany Domínguez Pérez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains outputs generated from the original paired-end transcriptomic analyses of the false black coral Savalia savaglia. The dataset includes the following files:

    • 75282_ID2093_3-SAS_S416_L004_R1_001.fastq.P.qtrim.zip: Preprocessed forward reads used for the novo assembly, obtained from the paired-end RNA-Seq data of Savalia savaglia, containing high-quality sequences that passed quality trimming and filtering using Trimmomatic.

    • 75282_ID2093_3-SAS_S416_L004_R2_001.fastq.P.qtrim.zip: Preprocessed reverse reads used for the novo assembly, obtained from the paired-end RNA-Seq data of Savalia savaglia, containing high-quality sequences that passed quality trimming and filtering using Trimmomatic.

    • Assembly_Ss_PE.Trinity.fasta: Original, non-filtered de novo paired-end transcriptome assembly of Savalia savaglia, generated from 68 million PE reads using the Trinity Assembler.

    • Assembly_Ss_PE_Trinity.fasta_stats.txt: Statistical summary of the paired-end transcriptome assembly of Savalia savaglia.

    • Assembly_Ss_PE_Trinity.fasta.gene_trans_map: Transcript-to-gene mapping file generated during the paired-end transcriptome assembly of Savalia savaglia.

    • quant_Ss_PE.sf: Salmon output containing expression values for the assembled transcripts from the paired-end assembly of Savalia savaglia.

    Additionally, the dataset includes the following files from DIAMOND BLASTx analyses, which used the original de novo paired-end transcriptome assembly of the false black coral Savalia savaglia:

    • UniRef90_PE.diamond.blastx.outfmt6: BLASTx output file against the UniRef90 database, reporting the top alignment for each query (assembled transcripts) from the paired-end assembly of Savalia savaglia.

    • UniRef90_PE.diamond.blastx.outfmt6.grouped: Grouped BLASTx hits from the paired-end assembly of Savalia savaglia, designed to improve sequence coverage by combining multiple high-scoring segment pairs (HSPs).

    • UniRef90_PE.diamond.blastx.outfmt6.hist: Histogram summarizing the distribution of BLASTx hit lengths obtained from the paired-end assembly of Savalia savaglia.

    • UniRef90_PE.diamond.blastx.outfmt6.w_pct_hit_length: File providing percentages of hit lengths from BLASTx analyses of the paired-end assembly of Savalia savaglia, including the top hit's length and the percent of the length covered in the alignment.

  20. v

    Uniref Sociedad Anonima Cerrada Company profile with phone,email, buyers,...

    • volza.com
    csv
    Updated Feb 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza.LLC (2025). Uniref Sociedad Anonima Cerrada Company profile with phone,email, buyers, suppliers, price, export import shipments. [Dataset]. https://www.volza.com/company-profile/uniref-sociedad-anonima-cerrada-14652482
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Volza.LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2014 - Sep 30, 2021
    Variables measured
    Count of exporters, Count of importers, Sum of export value, Sum of import value, Count of export shipments, Count of import shipments
    Description

    Credit report of Uniref Sociedad Anonima Cerrada contains unique and detailed export import market intelligence with it's phone, email, Linkedin and details of each import and export shipment like product, quantity, price, buyer, supplier names, country and date of shipment.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zach Nussbaum (2023). uniref90 [Dataset]. https://huggingface.co/datasets/zpn/uniref90

uniref90

zpn/uniref90

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 14, 2023
Authors
Zach Nussbaum
Description

zpn/uniref90 dataset hosted on Hugging Face and contributed by the HF Datasets community