52 datasets found
  1. f

    Data_Sheet_1_BioVDB: biological vector database for high-throughput gene...

    • frontiersin.figshare.com
    pdf
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren (2024). Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF [Dataset]. http://doi.org/10.3389/frai.2024.1366273.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    Frontiers
    Authors
    Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.

  2. f

    Data_Sheet_1_BioMaster: An Integrated Database and Analytic Platform to...

    • figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beibei Wang; Huayi Yang; Jianan Sun; Chuhao Dou; Jian Huang; Feng-Biao Guo (2023). Data_Sheet_1_BioMaster: An Integrated Database and Analytic Platform to Provide Comprehensive Information About BioBrick Parts.PDF [Dataset]. http://doi.org/10.3389/fmicb.2021.593979.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Beibei Wang; Huayi Yang; Jianan Sun; Chuhao Dou; Jian Huang; Feng-Biao Guo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetic biology seeks to create new biological parts, devices, and systems, and to reconfigure existing natural biological systems for custom-designed purposes. The standardized BioBrick parts are the foundation of synthetic biology. The incomplete and flawed metadata of BioBrick parts, however, are a major obstacle for designing genetic circuit easily, quickly, and accurately. Here, a database termed BioMaster http://www.biomaster-uestc.cn was developed to extensively complement information about BioBrick parts, which includes 47,934 items of BioBrick parts from the international Genetically Engineered Machine (iGEM) Registry with more comprehensive information integrated from 10 databases, providing corresponding information about functions, activities, interactions, and related literature. Moreover, BioMaster is also a user-friendly platform for retrieval and analyses of relevant information on BioBrick parts.

  3. e

    PROSITE profiles

    • ebi.ac.uk
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). PROSITE profiles [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Feb 5, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family a new sequence belongs. PROSITE is based at the Swiss Institute of Bioinformatics (SIB), Geneva, Switzerland.

  4. e

    CATH-Gene3D

    • ebi.ac.uk
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). CATH-Gene3D [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Oct 21, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The CATH-Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. CATH-Gene3D is based at University College, London, UK.

  5. d

    Dr. Duke's Phytochemical and Ethnobotanical Databases

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Dr. Duke's Phytochemical and Ethnobotanical Databases [Dataset]. https://catalog.data.gov/dataset/dr-dukes-phytochemical-and-ethnobotanical-databases-0849e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Of interest to pharmaceutical, nutritional, and biomedical researchers, as well as individuals and companies involved with alternative therapies and and herbal products, this database is one of the world's leading repositories of ethnobotanical data, evolving out of the extensive compilations by the former Chief of USDA's Economic Botany Laboratory in the Agricultural Research Service in Beltsville, Maryland, in particular his popular Handbook of phytochemical constituents of GRAS herbs and other economic plants (CRC Press, Boca Raton, FL, 1992). In addition to Duke's own publications, the database documents phytochemical information and quantitative data collected over many years through research results presented at meetings and symposia, and findings from the published scientific literature. The current Phytochemical and Ethnobotanical databases facilitate plant, chemical, bioactivity, and ethnobotany searches. A large number of plants and their chemical profiles are covered, and data are structured to support browsing and searching in several user-focused ways. For example, users can get a list of chemicals and activities for a specific plant of interest, using either its scientific or common name download a list of chemicals and their known activities in PDF or spreadsheet form find plants with chemicals known for a specific biological activity display a list of chemicals with their LD toxicity data find plants with potential cancer-preventing activity display a list of plants for a given ethnobotanical use find out which plants have the highest levels of a specific chemical References to the supporting scientific publications are provided for each specific result. Resources in this dataset:Resource Title: Duke-Source-CSV.zip. File Name: Duke-Source-CSV.zipResource Description: Dr. Duke's Phytochemistry and Ethnobotany - raw database tables for archival purposes. Visit https://phytochem.nal.usda.gov/phytochem/search for the interactive web version of the database.Resource Title: Data Dictionary (preliminary). File Name: DrDukesDatabaseDataDictionary-prelim.csvResource Description: This Data Dictionary describes the columns for each table. [Note that this is in progress and some variables are yet to be defined or are unused in the current implementation. Please send comments/suggestions to nal-adc-curator@ars.usda.gov ]

  6. e

    PIRSF

    • ebi.ac.uk
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). PIRSF [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Apr 7, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. PIRSF is based at the Protein Information Resource, Georgetown University Medical Centre, Washington DC, US.

  7. f

    Data_Sheet_1_Classifying Breast Cancer Molecular Subtypes by Using Deep...

    • figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narjes Rohani; Changiz Eslahchi (2023). Data_Sheet_1_Classifying Breast Cancer Molecular Subtypes by Using Deep Clustering Approach.PDF [Dataset]. http://doi.org/10.3389/fgene.2020.553587.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Narjes Rohani; Changiz Eslahchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer is a complex disease with a high rate of mortality. The characteristics of tumor masses are very heterogeneous; thus, the appropriate classification of tumors is a critical point in the effective treatment. A high level of heterogeneity has also been observed in breast cancer. Therefore, detecting the molecular subtypes of this disease is an essential issue for medicine that could be facilitated using bioinformatics. This study aims to discover the molecular subtypes of breast cancer using somatic mutation profiles of tumors. Nonetheless, the somatic mutation profiles are very sparse. Therefore, a network propagation method is used in the gene interaction network to make the mutation profiles dense. Afterward, the deep embedded clustering (DEC) method is used to classify the breast tumors into four subtypes. In the next step, gene signature of each subtype is obtained using Fisher's exact test. Besides the enrichment of gene signatures in numerous biological databases, clinical and molecular analyses verify that the proposed method using mutation profiles can efficiently detect the molecular subtypes of breast cancer. Finally, a supervised classifier is trained based on the discovered subtypes to predict the molecular subtype of a new patient. The code and material of the method are available at: https://github.com/nrohani/MolecularSubtypes.

  8. e

    SUPERFAMILY

    • ebi.ac.uk
    Updated Nov 8, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2010). SUPERFAMILY [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Nov 8, 2010
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent the entire SCOP superfamily that the domain belongs to. SUPERFAMILY is based at the University of Bristol, UK.

  9. d

    Database of marine invertebrate dispersal parameters and species ranges...

    • search.dataone.org
    • bco-dmo.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James E. Byers; James M. Pringle; John P. Wares; Paula Pappalardo (2021). Database of marine invertebrate dispersal parameters and species ranges including locations along East Coast of North America (CoastBenthBiogeo project) [Dataset]. https://search.dataone.org/view/sha256%3Ad608fbfc570dc2c1e4af9af35dfd1b07b2f376803cb32af390df3e1d5f7ad184
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Biological and Chemical Oceanography Data Management Office (BCO-DMO)
    Authors
    James E. Byers; James M. Pringle; John P. Wares; Paula Pappalardo
    Area covered
    Description

    This is a database of marine invertebrate dispersal parameters and species ranges along East Coast of North America with latitude and longitude calculated and added programmatically.

    The raw data for range was gathered from occurrence data in the GBIF dataset.

    Life history was gathered from a Literature Review.

    The complete dataset methodology is detailed in Pappalardo P, Pringle J, Wares J, and J Byers (2015): The location, strength, and mechanisms behind marine biogeographic boundaries of the east coast of North America. Ecography 38: 001–010, 2015

    There are two other datasets associated with this coordinate system:
    http://www.bco-dmo.org/dataset/554871: Database of marine invertebrate dispersal parameters and species ranges (NE Coast N. America)
    and
    http://www.bco-dmo.org/dataset/554893: A series of coordinates and ranges from South and North America to which species occurrences are mapped according to a model.

  10. b

    Database of marine invertebrate dispersal parameters and species ranges from...

    • datacart.bco-dmo.org
    • bco-dmo.org
    • +1more
    csv
    Updated Apr 1, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James E. Byers; James M. Pringle; John P. Wares (2015). Database of marine invertebrate dispersal parameters and species ranges from UNH lab_UNH-model in the Durham, NH from 1999-2011 (CoastBenthBiogeo project) [Dataset]. https://datacart.bco-dmo.org/dataset/554871
    Explore at:
    csv(202.01 KB)Available download formats
    Dataset updated
    Apr 1, 2015
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    James E. Byers; James M. Pringle; John P. Wares
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    class, order, family, phylum, N_occur, range_max, range_min, references, genus_species, type_of_dispersal, and 1 more
    Description

    This is a database of marine invertebrate dispersal parameters and species ranges along the East Coast of North America.

    The raw data for range was gathered from occurrence data in the GBIF dataset.

    Life history was gathered from a Literature Review.

    The complete dataset methodology is detailed in Pappalardo P, Pringle J, Wares J, and J Byers (2015): The location, strength, and mechanisms behind marine biogeographic boundaries of the east coast of North America. Ecography 38: 001–010, 2015

  11. Data, Rcode and Supplementary Materials for: Effects of virgin micro- and...

    • zenodo.org
    bin, pdf, txt
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Besson; Marc Besson (2024). Data, Rcode and Supplementary Materials for: Effects of virgin micro- and nano-plastics on fish: Trends, meta-analysis and perspectives [Dataset]. http://doi.org/10.5281/zenodo.3694955
    Explore at:
    bin, pdf, txtAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marc Besson; Marc Besson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Review_MNNP_Fish

    Data, Rcode and Supplementary Materials for: Effects of virgin micro- and nano-plastics on fish: Trends, meta-analysis and perspectives

    Supplementary_Information_1.pdf:
    This document contains a list of names that are used in the database (see Supplementary_Information_2) to describe the biological endpoints investigated within the 46 studies that we have reviewed

    Supplementary_Information_2.xlsx:
    This document contains 6 sheets:

    • 'Database' sheet: contains the full database compiling the 46 studies and 782 biological endpoints analyzed in this review paper.
    • 'INFO' sheet: contains the information to understand how MP/NP sizes and concentrations were attributed to classes
    • 'Size_Class_analytics' sheet: contains the analyzed data (from the database) regarding MP/NP size classes and effects on fish biological functions. This sheet is used to build Figure 4.
    • 'Mass_Conc_analytics' sheet: contains the analyzed data (from the database) regarding MP/NP mass concentration classes and effects on fish biological functions. This sheet is used to build Figure 4.
    • 'Part_Conc_analytics' sheet: contains the analyzed data (from the database) regarding MP/NP particle concentration classes and effects on fish biological functions. This sheet is used to build Figure 4.
    • 'Exposure_Path_analytics' sheet: contains the analyzed data (from the database) regarding MP/NP exposure pathway and effects on fish biological functions. This sheet is used to build Figure 3.

    Plabib_Fig2.xlsx:
    This document contains the data necessary to build Figure 2 of this review paper. Data has been extracted from Supplementary_Information_2 database.

    Plabib_Fig3_pie.txt:
    This document contains the data necessary to build Figure 3 pie chart. Data has been extracted from Supplementary_Information_2 database.

    PlaBib_Rcode_forFigures.R:
    This document contains the Rcode necessary to build all figures from this manuscript

  12. File S1 - A System to Automatically Classify and Name Any Individual...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haitham Marakeby; Eman Badr; Hanaa Torkey; Yuhyun Song; Scotland Leman; Caroline L. Monteil; Lenwood S. Heath; Boris A. Vinatzer (2023). File S1 - A System to Automatically Classify and Name Any Individual Genome-Sequenced Organism Independently of Current Biological Classification and Nomenclature [Dataset]. http://doi.org/10.1371/journal.pone.0089142.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Haitham Marakeby; Eman Badr; Hanaa Torkey; Yuhyun Song; Scotland Leman; Caroline L. Monteil; Lenwood S. Heath; Boris A. Vinatzer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tables S1–S5, Report for each genome used in this article the most similar genome based on which the provisional genome code was assigned, the ANIb% value, the % of aligned fragments, and the assigned genome code. (PDF)

  13. b

    Jellyfish Database Initiative: Global records on gelatinous zooplankton for...

    • bco-dmo.org
    • search.dataone.org
    csv
    Updated Aug 28, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Condon; Carlos M. Duarte; Cathy Lucas; Kylie Pitt (2014). Jellyfish Database Initiative: Global records on gelatinous zooplankton for the past 200 years, collected from global sources and literature (Trophic BATS project) [Dataset]. http://doi.org/10.1575/1912/7191
    Explore at:
    csv(104.11 MB)Available download formats
    Dataset updated
    Aug 28, 2014
    Dataset provided by
    Biological and Chemical Data Management Office
    Authors
    Robert Condon; Carlos M. Duarte; Cathy Lucas; Kylie Pitt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    day, lat, lon, date, year, depth, month, taxon, contact, density, and 28 more
    Description

    The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. Other auxiliary metadata, such as physical, environmental and biometric information relating to the gelatinous zooplankton metadata, are included with each respective entry. JeDI has been developed and designed as an open access research tool for the scientific community to quantitatively define the global baseline of gelatinous zooplankton populations and to describe long-term and large-scale trends in gelatinous zooplankton populations and blooms. It has also been constructed as a future repository of datasets, thus allowing retrospective analyses of the baseline and trends in global gelatinous zooplankton populations to be conducted in the future.

    References:

    Lucas, C.J., et al. 2014. Gelatinous zooplankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecol. Biogeogr. (DOI: 10.1111/geb.12169)

    Condon, R. H., et al. 2013. Recurrent jellyfish blooms are a consequence of global oscillations. PNAS vol. 110(3) 1000-1005. www.pnas.org/cgi/doi/10.1073/pnas.1210920110)

    Condon, R. H., et al. 2012.Questioning the Rise of Gelatinous Zooplankton in the World’s Oceans. BioScience vol. 62(2) 160-169. (doi:10.1525/bio.2012.62.2.9)

  14. Status and trends of Macquarie Island Albatrosses and Giant Petrels:...

    • data.aad.gov.au
    • researchdata.edu.au
    • +2more
    Updated May 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ALDERMAN, RACHAEL; GALES, ROSEMARY (2022). Status and trends of Macquarie Island Albatrosses and Giant Petrels: management and conservation of threatened seabirds [Dataset]. http://doi.org/10.26179/5dc9f8c60ab2c
    Explore at:
    Dataset updated
    May 27, 2022
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Antarctic Data Centre
    Authors
    ALDERMAN, RACHAEL; GALES, ROSEMARY
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 3, 1952 - Jun 30, 2019
    Area covered
    Description

    Albatross and petrel populations have declined globally due to interactions with fishing operations. The survival of four albatross and two giant petrel species breeding on Macquarie Island is threatened and ongoing monitoring is essential to assess their conservation status and mitigate negative influences. Long-term studies are required to obtain reliable information on population size and productivity and age- and sex- related survival parameters. The birds' oceanic movements is also being investigated so that questions regarding temporal and spatial overlap with fisheries can be addressed.

    Demographic and population data collected for the 2012-13 breeding season on Macquarie Island for 4 species of albatross and 2 species of giant petrel are summarised in the annual report (pdf) and all data contained in tables therein or attached xlxs spreadsheets and access database. Data collected includes breeding census, breeding success, nest location, banding and resight data for the 2012-13 season. The Access database contains data from 1950-2012.

    2013-2014 information are held in the 2013-2014 folder, which includes several excel spreadsheets, an updated access database, and a copy of the final report.

    2014-2015 information are held in the 2014-2015 folder, which includes several excel spreadsheets, a copy of the report, and updated database tables.

    2015-2016 information are held in the 2015-2016 folder, which includes several excel spreadsheets, a copy of the report, and updated database tables.

    2016-2017 information are held in the 2016-2017 folder, which includes several excel spreadsheets.

    2017-2018 information are held in the 2017-2018 folder, which includes several excel spreadsheets and a pdf document showing the location of nesting sites (waypoints provided in the excel files).

    2018-2019 information are held in the 2018-2019 folder, which includes several excel spreadsheets and a pdf document showing the location of nesting sites (waypoints provided in the excel files).

    This project has replaced project 2569 (which in turn replaced project 751).

  15. Data from: NMR Database of Lignin and Cell Wall Model Compounds.

    • osti.gov
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lu, Fachuang; Ralph, John; Ralph, Sally A (2024). NMR Database of Lignin and Cell Wall Model Compounds. [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2409191
    Explore at:
    Dataset updated
    Aug 1, 2024
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Office of Sciencehttp://www.er.doe.gov/
    Department of Energy Biological and Environmental Research Program
    Great Lakes Bioenergy Research Center (GLBRC), Madison, WI (United States); US Forest Products Lab (USFPL)
    Authors
    Lu, Fachuang; Ralph, John; Ralph, Sally A
    Description

    This database was designed to provide a coherent, single source of NMR data of lignin and other plant cell wall model compounds. The database exists as an Adobe pdf cross-platform file for viewing and printing. This is the latest public version of the Database, version 2024/08 updated from the 2009 version.

  16. e

    PRINTS

    • ebi.ac.uk
    Updated Jun 14, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2012). PRINTS [Dataset]. https://www.ebi.ac.uk/interpro/
    Explore at:
    Dataset updated
    Jun 14, 2012
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family or domain. PRINTS is based at the University of Manchester, UK.

  17. Data from: Integrated Taxonomic Information System (ITIS)

    • gbif.org
    • demo.gbif-test.org
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Museum of Natural History, Smithsonian Institution (2025). Integrated Taxonomic Information System (ITIS) [Dataset]. http://doi.org/10.5066/f7kh0kbk
    Explore at:
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    National Museum of Natural History, Smithsonian Institution
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The White House Subcommittee on Biodiversity and Ecosystem Dynamics has identified systematics as a research priority that is fundamental to ecosystem management and biodiversity conservation. This primary need identified by the Subcommittee requires improvements in the organization of, and access to, standardized nomenclature. ITIS (originally referred to as the Interagency Taxonomic Information System) was designed to fulfill these requirements. In the future, the ITIS will provide taxonomic data and a directory of taxonomic expertise that will support the system. The ITIS is the result of a partnership of federal agencies formed to satisfy their mutual needs for scientifically credible taxonomic information. Since its inception, ITIS has gained valuable new partners and undergone a name change; ITIS now stands for the Integrated Taxonomic Information System. The goal is to create an easily accessible database with reliable information on species names and their hierarchical classification. The database will be reviewed periodically to ensure high quality with valid classifications, revisions, and additions of newly described species. The ITIS includes documented taxonomic information of flora and fauna from both aquatic and terrestrial habitats. The original ITIS partners include: Department of Commerce National Oceanic and Atmospheric Administration (NOAA) Department of Interior (DOI) Geological Survey (USGS) Environmental Protection Agency (EPA) Department of Agriculture (USDA) Agriculture Research Service (ARS) Natural Resources Conservation Service (NRCS) Smithsonian Institution National Museum of Natural History (NMNH) These agencies signed a Memorandum of Understanding and have formed a Steering Committee that directs two technical work groups - the Database Work Group (DWG) and the Taxonomy Work Group (TWG). The DWG is responsible for the database design and overseeing development of the system to meet the requirements of the ITIS partners. The TWG is responsible for the quality and integrity of the database information. In addition to the database, the working groups have created "Taxonomic Workbench" software designed for easy entry and manipulation of taxonomic data. Primary objectives of the TWG include the review of data prior to incorporation into the ITIS and the establishment of a process for periodic peer review to ensure data quality. The TWG has evaluated the taxonomic information priorities of the agencies and is locating data sources for the highest priority groups. Efforts to gather data are helping to identify gaps in taxonomic coverage in both scientific expertise and available information. The TWG hopes to promote collaboration among, and provide a point of focus for, taxonomists, scientific institutions, and taxonomic information users. For each scientific name, ITIS will include the authority (author and date), taxonomic rank, associated synonyms and vernacular names where available, a unique taxonomic serial number, data source information (publications, experts, etc.) and data quality indicators. Expert reviews and changes to taxonomic information in the database will be tracked. Geographic coverage will be worldwide with initial emphasis on North American taxa. The TWG is coordinating its efforts with several national and international biodiversity programs. ITIS will be a significant contribution to the scientific infrastructure that is fundamental to the description, conservation, and management of the nation's biodiversity. Use of the ITIS and the taxonomic serial numbers will facilitate sharing of biological information among researchers and cooperating agencies by providing a common framework for taxonomic data. Agencies that typically cannot afford to maintain taxonomic data will have access to high quality taxonomic information through ITIS. This project allows the coordination of efforts among federal agencies, thereby increasing productivity and saving resources. Status reports on ITIS system development may be found in the What's New section. You can also contact Gerald Guala, Ph.D., Director, Integrated Taxonomic Information System (ITIS) at U.S. Geological Survey, 12201 Sunrise Valley Drive, MS 302, Reston, VA 20192 or via email at itiswebmaster@itis.gov .

  18. r

    Visual observations recorded during two blue whale voyages in the Bonney...

    • researchdata.edu.au
    • data.aad.gov.au
    • +2more
    Updated Sep 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KELLY, NATALIE; DOUBLE, MIKE; MILLER, BRIAN SETH; ANDREWS-GOFF, VIRGINIA (2017). Visual observations recorded during two blue whale voyages in the Bonney Upwelling, south east Australia in 2012 [Dataset]. http://doi.org/10.4225/15/59b8b16e60eb5
    Explore at:
    Dataset updated
    Sep 13, 2017
    Dataset provided by
    Australian Antarctic Data Centre
    Authors
    KELLY, NATALIE; DOUBLE, MIKE; MILLER, BRIAN SETH; ANDREWS-GOFF, VIRGINIA
    Time period covered
    Mar 13, 2012 - Mar 30, 2012
    Area covered
    Description

    An outline of the blue whale voyages of 2012 can be found here: http://www.marinemammals.gov.au/sorp/antarctic-blue-whale-project/bonney-upwelling-acoustic-testing-expeditions with further information here: http://www.marinemammals.gov.au/_data/assets/pdf_file/0005/135617/SC-64-SH11.pdf

    The 'Logger' data entry system was developed by the International Fund for Animal Welfare (IFAW) and is a flexible system to record information during a voyage. This system was the primary data entry system for the voyage and all events were recorded in Logger’s database.

    Blue whale voyage 1 datasets: 12 - 25 January 2012 Sightings from the first blue whale voyage are recorded across three access databases: 20120117LoggerFinalPart1Updated.mdb 20120121LoggerFinalPart2Updated.mdb 20120125LoggerFinalPart3Updated.mdb

    These databases contain tables describing: Comments: details additional to sightings entered or data entry omissions, time stamped (UTC) Observer effort - codes found in lookup table, date/time in UTC GPS data (time stamped, UTC) and heading Lookup - contains all topic codes to apply to all other tables Resights: resighting details for sightings already recorded, time/date in UTC, initial sighting number, blow count and notes Cetacean sightings - date/time in UTC, sighting number, observer name, vessel, estimate of distance, bearing, heading, species code, sighting cue code, estimate of number of individuals (low, best and high), group behaviour, pod compaction, surface synchronicity and comments Weather: Date/time in UTC, sightability, glare, sea state, wind strength, swell, weather, cloud cover, cloud height, notes

    Blue whale voyage 2 datasets: 13 - 30 March 2012 GPS data is stored in the file called 'gps_meld_data_exp.csv'. This is an amalgam dataset of two GPS data streams, that has been checked and corrected (see 'Quality' for further details. Date time is stored in two formats. The first is %Y-%m-%d %H-%M-%S format, as in "2012-03-16 17:54:32". The second format is a concatenated, orderable numeric string, as in 20120316175432.

    ### The small file 'trip_db.csv' contains a quick reference as to when the four trips of blue whale voyage 2 started, to the minute. These times have been corrected for the minor (i.e, 2 mins 15 second) error (see 'Quality' below).

    ### Effort database is contained in the file 'VWhale2_database_effort_corrected.csv'. A fair amount of 'correction' has gone on with this data as there were great variations in the way different people were adding new information into Logger. Furthermore, there were 'innovations' made to the Logger system, particularly after the first couple of trips. In particular, the effort was added to Logger in the first trip was exactly as it was in the first voyage (the VL was too seasick to make any amendments). So, according to the older effort classification, effort for the first trip started and ended, but there were no observer rotations or notes taken as to what platform the observers were perched on. Given there was quite a bit of seasickness that first day, the only observers likely to be working would have been PE, PO and DD. These observers favoured the Fly Bridge so all sighting effort for the first trip has been allocated to these observers on the Fly Bridge.

    The subsequent innovations were: observers were not told how far away a potential calling whale was. If, however, the acousticians thought that we were almost upon the animal(s), they will indicate this to the observing team.

    Acoustic.search == 1 indicates when the acousticians have notified observers that there was a group of blue whales in the area.

    Local.Search == 1 indicates that after an initial sighting was made, sighting effort and boat movement converted into a search to get closer to the animal(s) in order to confirm their species (not usually such a huge issue with blue whales, admittedly), group size and to get photo-ID.

    FD == 1 when effort on the foredeck either started or continued. FB == 1 when effort on the fly bridge either started or continued.

    For the effort types, the effort interval is defined as the time between the row the '1' value first appears and the date/time of the next row of the similar effort type.

    Index.new: Because two databases were merged to form the one effort dataset (the first trip had its own Logger MS-Access database), an overall index, Index.new, was created for continuity. Index: Effort index as it appears in the original Logger MS-Access databases.

    GpsIndex: In Logger, each Effort (or sighting) row is tagged with the accompanying GPS index number. This ties an effort event with the date/time and geographical location information displayed in the GPS data. GPSIndex.cor: As with GpsIndex but, again, as the databases were merged, a new GPSIndex value was created (.cor == corrected) to account for this, and for the added BPM GPS data. GpsTime: Date (only), as derived from GPS. Has been abbreviate to only date due to the joys of how Microsoft packages deal with date/time objects; full date/time value for each effort row can be derived from the GPS data, via the GPSIndex.cor value EffortNo: Each effort row has been assigned a unique number within each respective MS-Access Logger file. This is somewhat redundant with the Index value. Local time: When Logger records an event, it also takes a date/time value from the local computer. It's not really clear to me what this value actually represents. Observer: The head observer at the time the effort event was logged. Basically, just means the person driving the Logger computer (i.e., physically entering values and making weather obs) Event: Each event has a unique descriptor number. See the 'Lookup' table in the MS-Access database. Event.cor: This column should be completely ignored. Notes: Any comments that accompanied particular effort entries. See also the Comments table for notes not specifically related to any Effort entries. Platform: Which sighting platforms observers either started or stopped effort on, or rotated through. Unfortunately, this information wasn't always consistently recorded. See the FB and FD columns for a more correct record of when sighting effort was on and off. Platform.cor: This column should be ignored. Observers: All observers on rotation. Sonobuoy: when the launching of a sonobuoy was noted in Logger, here are the numbers (this is not a complete list) Trip: which trip it was

    ##### Sightings for all species are given in 'sightings.csv'.

    ##### Weather observations are in 'weather.csv'. Recording of glare angles (i.e., start and end bearing) started on third trip.

    ##### Comments in 'comments.csv'. Please note there were no comments recorded during the first trip.

  19. o

    Location of Ryanodine Receptor Type 2 Associated Catecholaminergic...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Chang; Halil Beqaj; Leah Sittenfeld; Marco Miotto; Haikel Dridi; Gloria Willson; Carolyn Jorge Martinez; Jaan Altosaar Li; Steven Reiken; Yang Liu; Zonglin Dai; Andrew Marks (2024). Location of Ryanodine Receptor Type 2 Associated Catecholaminergic Polymorphic Ventricular Tachycardia Variants Dataset [Dataset]. http://doi.org/10.5281/zenodo.12786084
    Explore at:
    Dataset updated
    Jul 19, 2024
    Authors
    Alexander Chang; Halil Beqaj; Leah Sittenfeld; Marco Miotto; Haikel Dridi; Gloria Willson; Carolyn Jorge Martinez; Jaan Altosaar Li; Steven Reiken; Yang Liu; Zonglin Dai; Andrew Marks
    Description

    Location of RYR2 Associated CPVT Variants Dataset Catecholaminergic polymorphic ventricular tachycardia (CPVT) is a rare inherited arrhythmia caused by pathogenic RYR2 variants. CPVT is characterized by exercise/stress-induced syncope and cardiac arrest in the absence of resting ECG and structural cardiac abnormalities. Here, we present a database collected from 225 clinical papers, published from 2001-October 2020, about CPVT associated RYR2 variants. 1355 patients, both with and without CPVT, with RYR2 variants are in the database. There are a total of 968 CPVT patients or suspected CPVT patients in the database. The database includes information regarding genetic diagnosis, location of the RYR2 variant(s), clinical history and presentation, and treatment strategies for each patient. Patients will have a varying depth of information in each of the provided fields. Database website: https://cpvtdb.port5000.com/ Dataset Information This dataset includes: eTable2.xlsx Tabular version of the database Most relevant tables in the PostgreSQL database regarding patient sex, conditions, treatments, family history, and variant information were joined to create this database Views calculating the affected RYR2 exons, domains and subdomains have been joined to patient information m-n tables for patient's conditions and treatments have been converted to pivot tables - every condition and treatment that has at least 1 person with that condition or treatment is a column. NOTE: This was created using a LEFT JOIN of individuals and individual_variants tables. Individuals with more than 1 recorded variant will be listed on multiple rows. There is only 1 person in this database as of the current version with multiple recorded variants _.gz.sql PostgreSQL database dump Expands to about 4.1 GB after loading the database dump The database includes two schemas: public: Includes all information in patients and variants Also includes all RYR2 variants in ClinVar uta: Contains the biocommons/uta database required to make the hgvs Python package to work locally See https://github.com/biocommons/uta for more information NOTE: It is recommended to use this version of the database only for development or analysis purposes database_tables.pdf Contains information on most of the database tables in the public schema 00_globals.sql Required to load the PostgreSQL database dump Creates a user named anonymous for the uta schema How To Load Database Using Docker First, download the 00_globals.sql and _.gz.sql file and move it into a directory. The default postgres image will load files from the /docker-entrypoint-initdb.d directory if the database is empty. See Docker Hub for more information. Example using docker compose with pgadmin and a volume to persist the data. # Use postgres/example user/password credentials version: '3.9' volumes: mydatabasevolume: null services: db: image: postgres:16 restart: always environment: POSTGRES_PASSWORD: mysecretpassword POSTGRES_USER: postgres volumes: - ':/docker-entrypoint-initdb.d/' - 'mydatabasevolume:/var/lib/postgresql/data' pgadmin: image: dpage/pgadmin4 environment: PGADMIN_DEFAULT_EMAIL: user@domain.com PGADMIN_DEFAULT_PASSWORD: SuperSecret Creating the Database from Scratch See https://github.com/alexdaiii/cpvt-database-loader for source code to create the database from scratch.

  20. c

    Protein Structural Domain Classification

    • cathdb.info
    • ec.i4cologne.com
    • +3more
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Protein Structural Domain Classification [Dataset]. http://identifiers.org/MIR:00100005
    Explore at:
    Dataset updated
    Sep 30, 2024
    Description

    CATH Domain Classification List (latest release) - protein structural domains classified into CATH hierarchy.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren (2024). Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF [Dataset]. http://doi.org/10.3389/frai.2024.1366273.s001

Data_Sheet_1_BioVDB: biological vector database for high-throughput gene expression meta-analysis.PDF

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Mar 8, 2024
Dataset provided by
Frontiers
Authors
Michał J. Winnicki; Chase A. Brown; Hunter L. Porter; Cory B. Giles; Jonathan D. Wren
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

High-throughput sequencing has created an exponential increase in the amount of gene expression data, much of which is freely, publicly available in repositories such as NCBI's Gene Expression Omnibus (GEO). Querying this data for patterns such as similarity and distance, however, becomes increasingly challenging as the total amount of data increases. Furthermore, vectorization of the data is commonly required in Artificial Intelligence and Machine Learning (AI/ML) approaches. We present BioVDB, a vector database for storage and analysis of gene expression data, which enhances the potential for integrating biological studies with AI/ML tools. We used a previously developed approach called Automatic Label Extraction (ALE) to extract sample labels from metadata, including age, sex, and tissue/cell-line. BioVDB stores 438,562 samples from eight microarray GEO platforms. We show that it allows for efficient querying of data using similarity search, which can also be useful for identifying and inferring missing labels of samples, and for rapid similarity analysis.

Search
Clear search
Close search
Google apps
Main menu