100+ datasets found
  1. b

    Database of homology-derived secondary structure of proteins

    • bioregistry.io
    Updated Jan 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Database of homology-derived secondary structure of proteins [Dataset]. https://bioregistry.io/hssp
    Explore at:
    Dataset updated
    Jan 16, 2022
    Description

    HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein.

  2. d

    Data from: Postfire Debris-Flow Database (Literature Derived)

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Postfire Debris-Flow Database (Literature Derived) [Dataset]. https://catalog.data.gov/dataset/postfire-debris-flow-database-literature-derived
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    U.S. Geological Survey
    Description

    The data presented in this data release represent observations of postfire debris flows that have been collected from publicly available datasets. Data originate from 13 different countries: the United States, Australia, China, Italy, Greece, Portugal, Spain, the United Kingdom, Austria, Switzerland, Canada, South Korea, and Japan. The data are located in the file called “PFDF_database_sortedbyReference.txt” and a description of each column header can be found in both the file “column_headers.txt” and the metadata file (“Post-fire Debris-Flow Database (Literature Derived).xml”). The observations are derived from areas that have been burned by wildfire and are global in nature. However, this dataset is synthesized from information collected by many different researchers for different purposes, and therefore not all fields are available for each of the observations. Missing information is indicated by the value “-9999” in the ”PFDF_database_sortedbyReference.txt” file. Note that the text file contains special characters and a mix of date-time formats that reflect the original data provided by the authors. The text may not be displayed correctly if it is opened by proprietary software such as Microsoft Excel but will appear correctly when opened in a text editor software.

  3. ASTEROID LIGHTCURVE DERIVED DATA V13.0 - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). ASTEROID LIGHTCURVE DERIVED DATA V13.0 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/asteroid-lightcurve-derived-data-v13-0-e7aaa
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This is a compilation of published rotational parameters derived from lightcurve data for asteroids, based on the Warner et al. (2009) Asteroid Lightcurve Database. This is the version released in March 2012. In addition to reported rotational parameters by individual paper, there is a summary file with the values adopted by Harris, Warner, and Pravec as the most likely correct values for each asteroid. The data set also contains files listing known binary asteroids and 'tumbling' asteroids.

  4. Data from: Global Data Set of Derived Soil Properties, 0.5-Degree Grid...

    • catalog.data.gov
    • s.cnmilf.com
    • +4more
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ORNL_DAAC (2025). Global Data Set of Derived Soil Properties, 0.5-Degree Grid (ISRIC-WISE) [Dataset]. https://catalog.data.gov/dataset/global-data-set-of-derived-soil-properties-0-5-degree-grid-isric-wise-bdb54
    Explore at:
    Dataset updated
    Sep 19, 2025
    Dataset provided by
    Oak Ridge National Laboratory Distributed Active Archive Center
    Description

    The World Inventory of Soil Emission Potentials (WISE) database currently contains data for over 4300 soil profiles collected mostly between 1950 and 1995. This database has been used to generate a series of uniform data sets of derived soil properties for each of the 106 soil units considered in the Soil Map of the World (FAO-UNESCO, 1974). These data sets were then linked to a 1/2 degree longitude by 1/2 degree latitude version of the edited and digital Soil Map of the World (FAO, 1995) to generate GIS raster image files for the following variables: Total available water capacity (mm water per 1 m soil depth) soil organic carbon density (kg C/m2 for 0-30cm depth range) soil organic carbon density (kg C/m2 for 0-100cm depth range) soil carbonate carbon density (kg C/m**2 for 0-100cm depth range) soil pH (0-30 cm depth range) soil pH (30-100 cm depth range) Data Citation: The data set should be cited as follows: Batjes, N. H. (ed). 2000. Global Data Set of Derived Soil Properties, 0.5-Degree Grid (ISRIC-WISE). Available on-line from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.

  5. f

    Minimal dataset derived from the database.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated May 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Molnár, Tamás; Szántó, Kata; Takács, Péter; Szamosi, Tamás; Bálint, Anita; Kunovszki, Péter; Borsi, András; Milassin, Ágnes; Farkas, Klaudia; Gimesi-Országh, Judit; Lakatos, Péter L. (2020). Minimal dataset derived from the database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000583359
    Explore at:
    Dataset updated
    May 14, 2020
    Authors
    Molnár, Tamás; Szántó, Kata; Takács, Péter; Szamosi, Tamás; Bálint, Anita; Kunovszki, Péter; Borsi, András; Milassin, Ágnes; Farkas, Klaudia; Gimesi-Országh, Judit; Lakatos, Péter L.
    Description

    Sheet “Prevalence, Incidence”–Prevalent and incident patient numbers. Sheet “Demographics”–Patient counts in age groups, median, mean and standard deviation of age. Sheet “Deaths”–Raw death counts and median age at death. Sheet “Malignancies distribution”–Patient counts with malignancy diagnoses based on 3-digit ICD-10 code. Sheet “CRC epidemiology”–Patient counts with new CRC diagnoses, median age at CRC diagnosis and death, total death counts. Sheet “Survival1”–Survival curve data, OS, UC patients and controls. Sheet “Survival2”–Survival curve data, OS, CRC-UC patients, whole group and age stratified. (XLSX)

  6. o

    Addressing statistical biases in nucleotide-derived protein databases for...

    • omicsdi.org
    Updated Jul 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC3703792
    Explore at:
    Dataset updated
    Jul 10, 2023
    Variables measured
    Unknown
    Description

    Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.

  7. r

    Literature-derived human gene-disease network

    • rrid.site
    • neuinfo.org
    • +2more
    Updated Nov 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Literature-derived human gene-disease network [Dataset]. http://identifiers.org/RRID:SCR_005653
    Explore at:
    Dataset updated
    Nov 30, 2025
    Description

    A text mining derived database with focus on extracting and classifying gene-disease associations with respect to several biomolecular conditions. It uses a machine learning based algorithm to extract semantic gene-disease relations from a textual source of interest. The semantic gene-disease relations were extracted with F-measures of 78. More specifically, the textual source utilized here originates from Entrez Gene''''s GeneRIF (Gene Reference Into Function) database (Mitchell, et al., 2003). LHGDN was created based on a GeneRIF version from March 31st, 2009, consisting of 414241 phrases. These phrases were further restricted to the organism Homo sapiens, which resulted in a total of 178004 phrases. We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph. We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining.

  8. Combined data set derived from new data generated herein and publicly...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joana Matzen da Silva; Simon Creer; Antonina dos Santos; Ana C. Costa; Marina R. Cunha; Filipe O. Costa; Gary R. Carvalho (2023). Combined data set derived from new data generated herein and publicly available DNA barcoding projects from the Barcode of Life Database. [Dataset]. http://doi.org/10.1371/journal.pone.0019449.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joana Matzen da Silva; Simon Creer; Antonina dos Santos; Ana C. Costa; Marina R. Cunha; Filipe O. Costa; Gary R. Carvalho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Combined data set derived from new data generated herein and publicly available DNA barcoding projects from the Barcode of Life Database.

  9. DataSheet_1_MFPPDB: a comprehensive multi-functional plant peptide...

    • frontiersin.figshare.com
    pdf
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaozu Yang; Hongwei Wu; Yu Gao; Wei Tong; Ke Li (2023). DataSheet_1_MFPPDB: a comprehensive multi-functional plant peptide database.pdf [Dataset]. http://doi.org/10.3389/fpls.2023.1224394.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Yaozu Yang; Hongwei Wu; Yu Gao; Wei Tong; Ke Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Plants produce a wide range of bioactive peptides as part of their innate defense mechanisms. With the explosive growth of plant-derived peptides, verifying the therapeutic function using traditional experimental methods are resources and time consuming. Therefore, it is necessary to predict the therapeutic function of plant-derived peptides more effectively and accurately with reduced waste of resources and thus expedite the development of plant peptides. We herein developed a repository of plant peptides predicted to have multiple therapeutic functions, named as MFPPDB (multi-functional plant peptide database). MFPPDB including 1,482,409 single or multiple functional plant origin therapeutic peptides derived from 121 fundamental plant species. The functional categories of these therapeutic peptides include 41 different features such as anti-bacterial, anti-fungal, anti-HIV, anti-viral, and anti-cancer. The detailed physicochemical information of these peptides was presented in functional search and physicochemical property search module, which can help users easily access the peptide information by the plant peptide species, ID, and functions, or by their peptide ID, isoelectric point, peptide sequence, and molecular weight through web-friendly interface. We further matched the predicted peptides to nine state-of-the-art curated functional peptide databases and found that at least 293,408 of the peptides possess functional potentials. Overall, MFPPDB integrated a massive number of plant peptides have single or multiple therapeutic functions, which will facilitate the comprehensive research in plant peptidomics. MFPPDB can be freely accessed through http://124.223.195.214:9188/mfppdb/index.

  10. f

    Working definition derived from the NHIS claims database.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Sep 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Son, Sumin; Ahn, Hyeong Sik; Kim, Seung Ho; Lee, Ki-Il; Kim, Ikhee; Mo, Ji-Hun; Kim, Hyun Jung (2023). Working definition derived from the NHIS claims database. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000942808
    Explore at:
    Dataset updated
    Sep 27, 2023
    Authors
    Son, Sumin; Ahn, Hyeong Sik; Kim, Seung Ho; Lee, Ki-Il; Kim, Ikhee; Mo, Ji-Hun; Kim, Hyun Jung
    Description

    Working definition derived from the NHIS claims database.

  11. MER2 Pancam Science Derived IOF Data Bundle - Dataset - NASA Open Data...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). MER2 Pancam Science Derived IOF Data Bundle - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/mer2-pancam-science-derived-iof-data-bundle
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    This bundle contains derived IOF data from the Panoramic Cameras (Pancam) on Mars Exploration Rover 2 (Spirit). These data were produced by the science team.

  12. Z

    BridgeDb: pathway identifier mapping database derived from Wikidata

    • data.niaid.nih.gov
    Updated May 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willighagen, Egon (2023). BridgeDb: pathway identifier mapping database derived from Wikidata [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7902768
    Explore at:
    Dataset updated
    May 7, 2023
    Dataset provided by
    Maastricht University
    Authors
    Willighagen, Egon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Release of a BridgeDb gene identifier mapping database between Wikidata and Ensembl.

    [INFO]: Database finished. INFO: old database is Wikidata 1.0.0 (build: 20230506) INFO: new database is Wikidata 1.0.0 (build: 20230506) INFO: Number of ids in Wd (Wikidata): 153715 (unchanged) INFO: Number of ids in En (Ensembl): 153637 (unchanged) INFO: new size is 95 Mb (changed +0.0%) INFO: total number of identifiers is 307352 INFO: total number of mappings is 307430

  13. w

    HUN AssetList Database v1p2 20150128

    • data.wu.ac.at
    • researchdata.edu.au
    • +1more
    Updated Oct 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Programme (2018). HUN AssetList Database v1p2 20150128 [Dataset]. https://data.wu.ac.at/schema/data_gov_au/ZmRiMzFhNGEtYTZjNC00ZDYwLWJhNDQtY2IxMjYwY2I1YTdl
    Explore at:
    Dataset updated
    Oct 9, 2018
    Dataset provided by
    Bioregional Assessment Programme
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Superseded by HUN AssetList v1.3 20150212 (GUID: dcf8349e-aaed-4d30-80ab-1c8cbad8fe68) on 2/12/2015

    This dataset contains the spatial and non-spatial (attribute) components of the Hunter subregion Asset List as an .mdb file, which is readable as an MS Access database or as an ESRI Personal Geodatabase.

    Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.

    Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.

    Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "AnR_database_HUN_v1p2_20150128.doc", located in the zip file as part of this dataset.

    The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.

    Detailed information describing the database structure and content can be found in the document "AnR_database_HUN_v1p2_20150128.doc" located in the zip file.

    Some of the source data used in the compilation of this dataset is restricted.

    Purpose

    The Asset List Database was developed to identify water dependent assets located within the Hunter subregion.

    Dataset History

    Superseded by HUN AssetList v1.3 20150212 (GUID: dcf8349e-aaed-4d30-80ab-1c8cbad8fe68) on 2/12/2015*****

    This dataset is an update of the previous version of the Hunter asset list database: "Asset list for Hunter - CURRENT"; ID: 51b1e021-2958-4cd3-8daa-ba46ece09d1c, which was updated with the inclusion of data from NSW Department of Primary Industries - Office of Water: HIGH PROBABILITY GROUNDWATER DEPENDENT VEGETATION WITH HIGH ECOLOGICAL VALUE (Hunter-Central Rivers).

    Dataset Citation

    Bioregional Assessment Programme (2015) HUN AssetList Database v1p2 20150128. Bioregional Assessment Derived Dataset. Viewed 09 October 2018, http://data.bioregionalassessments.gov.au/dataset/64ecd565-bb7c-4f21-951e-f35966b91c99.

    Dataset Ancestors

  14. w

    MBC Impact and Risk Analysis Database v01

    • data.wu.ac.at
    • researchdata.edu.au
    • +1more
    Updated Oct 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Programme (2017). MBC Impact and Risk Analysis Database v01 [Dataset]. https://data.wu.ac.at/schema/data_gov_au/NGZlMDQxZmItNzUzNy00MjY5LWFiYzctMTI5NDNhNWNlYmM1
    Explore at:
    Dataset updated
    Oct 25, 2017
    Dataset provided by
    Bioregional Assessment Programme
    Description

    Abstract

    The Maranoa-Balonne-Condamine Impact and Risk Analysis Database (Analysis Database) is a fit-for-purpose geospatial information system developed for the Impact and Risk Analysis (Component 3-4) products of the Bioregional Assessment Technical Programme (BATP).

    The version provided here for public download has been slightly modified to remove restricted material such as the co-ordinates of protected or threatened species. This version was used to populate BA Explorer.

    The Analysis Database brings together many of the data sets used in Components 1 and 2 of the assessments and includes hydrology and hydrogeology modelling results, landscape classes and economic, sociocultural and ecological assets. These data sets are listed in the Component 1 and 2 products under the Assessments tab in http://www.bioregionalassessments.gov.au/.

    An Analysis Database of common design and schema was implemented for each subregion where a full Impact and Risk Analysis was completed. To populate each database, input datasets were transformed, normalised and inserted into their respective Analysis Databases in accord with the common design and schema. The approach enabled the universal treatment of data analysis across all bioregions despite data being of different specifications and origins.

    The Analysis Database includes all the data used for the assessment of the subregion with the exception of those datasets that were not provided to the program with an open access licence. The database is constructed using the Open Source platform PostgreSQL coupled with PostGIS. This technology was considered to better enable the provenance and transparency requirements of the Programme. The files provided here have been prepared using the PostgreSQL version 9.5 SQL Dump function - pg_dump.

    A detailed description of the Analysis Database, its design, structure and application is provided in the supporting documentation: http://data.bioregionalassessments.gov.au/dataset/05e851cf-57a5-4127-948a-1b41732d538c

    Purpose

    The Maranoa-Balonne-Condamine Impact and Risk Analysis Database (Analysis Database) is the geospatial database for completing the Impact and Risk Analysis component of the Maranoa-Balonne-Condamine Bioregional Assessment. This includes the creating of results, tables and maps that appear in the relevant Products of each assessment. The database also manages the data used by the BA Explorer.

    An individual instance of the Analysis Database was developed for each subregion where a component 3-4 Impact and Risks Assessment was conducted. With the exception of the subregion-specific data contained within it and the removal of restricted data records, each analysis database is of identical design and structure.

    Dataset History

    This Analysis Database is an instance of PostgreSQL version 9.5 hosted on Linux Red Hat Enterprise Linux version 4.8.5-4. PostgreSQL geospatial capabilities are provided by POSTGIS version 2.2.

    Data pre-processing and upload into each PostgreSQL database was completed using FME Desktop (Oracle Edition) version 2016.1.2.1. Analysis data and results are provided to users and systems via the geospatial services of Geoserver version 2.9.1. Scientific analysis and mapping was undertaken by connecting a range of data using a combination of Microsoft Excel, QGIS and ArcMap systems.

    During the Programme and for its working life, the Analysis Database was hosted and managed on instances of Amazon Web Services managed by Geoscience Australia and the Bureau of Meteorology.

    Dataset Citation

    Bioregional Assessment Programme (2017) MBC Impact and Risk Analysis Database v01. Bioregional Assessment Derived Dataset. Viewed 25 October 2017, http://data.bioregionalassessments.gov.au/dataset/69075f3e-67ba-405b-8640-96e6cb2a189a.

    Dataset Ancestors

  15. n

    NASA Earthdata

    • earthdata.nasa.gov
    • s.cnmilf.com
    • +3more
    Updated Jul 24, 1994
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ORNL_CLOUD (1994). NASA Earthdata [Dataset]. http://doi.org/10.3334/ORNLDAAC/117
    Explore at:
    Dataset updated
    Jul 24, 1994
    Dataset authored and provided by
    ORNL_CLOUD
    Description

    During the 1989 FIFE field campaign, measurements were made of soil moisture release parameters and hydraulic conductivity. Bulk density and soil moisture release data were collected at five FIFE sites representing the major soil types in the FIFE study area. These data were used to model the porosity, saturated water potential, and the b-factor (the exponent of the power curve function) following the method of Clapp and Hormberger (1978). These soil moisture characteristics can be used to describe plant-available water and water movement through soils.

  16. Data from: ASTEROID LIGHTCURVE DERIVED DATA V14.0

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Aug 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). ASTEROID LIGHTCURVE DERIVED DATA V14.0 [Dataset]. https://catalog.data.gov/dataset/asteroid-lightcurve-derived-data-v14-0-53bb4
    Explore at:
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This is a compilation of published rotational parameters derived from lightcurve data for asteroids, based on the Warner et al. (2009) Asteroid Lightcurve Database. This is the version as of March 1, 2014. In addition to reported rotational parameters by individual paper, there is a summary file with the values adopted by Harris, Warner, and Pravec as the most likely correct values for each asteroid. The data set also contains files listing known binary asteroids and 'tumbling' asteroids.

  17. BIOSYSMOdb: Curated Database for Biodegradation and Bioremediation

    • data.europa.eu
    unknown
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). BIOSYSMOdb: Curated Database for Biodegradation and Bioremediation [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14795254?locale=et
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BIOSYSMOdb is a comprehensive and integrative database developed as part of BIOSYSMO project. This resource centralizes data on metabolic pathways, reactions, enzymes, and degradative organisms to address soil contamination caused by industrial, agricultural, and urban activities. BIOSYSMOdb serves as a bridge between computational and experimental research, offering a unified platform to accelerate bioremediation solutions. Dataset Description BIOSYSMOdb integrates curated and synthesized data from major public repositories: EAWAG BBD, MibPOPdb, MetaCyc, Uniprot, and KEGG. The database includes: Chemical level: Details on compounds relevant for biodegradation. Metabolic level: Data on pathways, reactions, enzymes, and organisms associated with degradation. Organism level: Information on degradative organisms and their genomic data. Protein level: Information on enzymes in charge of each reaction and their sequence data associates (if available) Data Structure The following files are included in the dataset: BIOSYSMOdb_Compounds_chemical_iden_v1.0.csv: Compounds identifiers iferred for other databases BIOSYSMOdb_Compounds_chemical_info_v1.0.csv: Compounds information collected from public sources BIOSYSMOdb_Compounds_onthology_cod_v1.0.csv: Compounds onthology codes derived from Classyfire BIOSYSMOdb_Compounds_onthology_term_v1.0.csv: Compounds onthology terms derived from Classyfire BIOSYSMOdb_Pathways_v1.0.csv: Pathways dataset BIOSYSMOdb_Reactions_v1.0.csv: Reactions dataset (containing substrates, products, enzymes and pathways associated) BIOSYSMOdb_Enzymes_v1.0.csv: Reactions dataset (containing reactions associated) BIOSYSMOdb_Compounds_v1.0.csv: Compounds principal dataset BIOSYSMOdb_Organisms_v1.0.csv: Organisms principal dataset (containing pathways associated and NCBI Genome ID when available) CSV Descriptions Compound ID: Unique identifier for each compound. Pathway Name: Name of the metabolic pathway. Reaction ID: Identifier for individual reactions. Enzyme/Protein ID: Unique identifier for associated enzymes. Organism Name: Name of the degradative organism. Jupyter Notebook for querying BIOSYSMOdbTo facilitate data exploration and connections within the CSV files, a Jupyter Notebook, BIOSYSMO_database_queries, has been created. This notebook enables users to analyze relationships between different datasets and execute relevant queries efficiently. Data Sources & Licenses This database includes data derived from diverse databases: EAWAG BBD: Data on biodegradation of persistent organic pollutants. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) MibPOPdb: Focused on microbial degradation of xenobiotics. Creative Commons Attribution 4.0 International (CC BY 4.0) license. MetaCyc: Comprehensive metabolic pathway database. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) KEGG: Genomic integration and metabolic networks. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) UniProt: Protein sequences. Creative Commons Attribution 4.0 International (CC BY 4.0) license. NCBI Genome: Organism Genomes. This database is public. Pubchem: Chemical Compounds. this database is public. CHebi: Chemical Compounds. Creative Commons Attribution 4.0 International (CC BY 4.0) license. Licensing and Attribution This dataset is shared under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Please credit BIOSYSMOdb and the original sources (EAWAG BBD, MibPOPdb, MetaCyc, ChEBI, Pubchem and KEGG) in any use or derivative works. BIOSYSMOdb was developed as part of the BIOSYSMO project, which has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101060211. Acknowledgments- MetaCyc, KEGG, EAWAG BBD, UniProt, NCBI Genome, PubChem, ChEBI, and MibPOPdb – For providing essential data that supported the curation of BIOSYSMOdb.- BIOSYSMO consortium – For their contributions to the database’s design and development. We extend our gratitude to the Horizon Europe programme and the European Union for their support in advancing research on bioremediation and biodegradation. Contact For inquiries, please contact: - Contact Name: Main Researcher: Marta Franco de Benito, MsC or Project Coordinator: Sara Gil Guerrero, PhD - Email: marta.franco@idener.ai // sara.gil@idener.ai - Institution: IDENER.AI

  18. s

    Comprehensive Systems-Biology Database

    • scicrunch.org
    • neuinfo.org
    • +2more
    Updated Jan 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Comprehensive Systems-Biology Database [Dataset]. http://identifiers.org/RRID:SCR_008185
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    CSB.DB presents the results of bio-statistical analysis on gene expression data in association with additional biochemical and physiological knowledge. The main aim of this database platform is to provide tools that support insight into life''s complexity pyramid with a special focus on the integration of data from transcript and metabolite profiling experiments. The main focus of the CSB project is the generation of new easily accessible knowledge about the relationship and the hierarchy of cellular components. Thus new progress towards understanding lifes complexity pyramid is made. For this aim statistical and computational algorithms are applied to organism specific data derived from publicly available multi-parallel technologies, currently such as expression profiles. The underlying data are derived from various research activities. Thus CSB project provides an integrated and centralized public resource allowing universal access on the generated knowledge CSB.DB: A Comprehensive Systems-Biology Database. The derived knowledge should support the formulation of new hypotheses about the respective functional involvement of genes beyond their (inter-) relationships. Another major goal of the CSB project is to supply the researchers with necessary information to formulate these new hypotheses without demanding any a-priori statistical knowledge of the user. The CSB project mainly focuses on application of required statistical tests as well as to assist the user during exploration of results with information / help files to support hypothesis generation

  19. f

    Data from: Tissue Usage Preference and Intrinsically Disordered Region...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Mar 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lam, Maggie P. Y.; Brenman, Stella; Lau, Edward; Ng, Dominic C. M.; Black, Alexander; Pandi, Boomathi (2024). Tissue Usage Preference and Intrinsically Disordered Region Remodeling of Alternative Splicing Derived Proteoforms in the Heart [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001322234
    Explore at:
    Dataset updated
    Mar 8, 2024
    Authors
    Lam, Maggie P. Y.; Brenman, Stella; Lau, Edward; Ng, Dominic C. M.; Black, Alexander; Pandi, Boomathi
    Description

    A computational analysis of mass spectrometry data was performed to uncover alternative splicing derived protein variants across chambers of the human heart. Evidence for 216 non-canonical isoforms was apparent in the atrium and the ventricle, including 52 isoforms not documented on SwissProt and recovered using an RNA sequencing derived database. Among non-canonical isoforms, 29 show signs of regulation based on statistically significant preferences in tissue usage, including a ventricular enriched protein isoform of tensin-1 (TNS1) and an atrium-enriched PDZ and LIM Domain 3 (PDLIM3) isoform 2 (PDLIM3-2/ALP-H). Examined variant regions that differ between alternative and canonical isoforms are highly enriched with intrinsically disordered regions. Moreover, over two-thirds of such regions are predicted to function in protein binding and RNA binding. The analysis here lends further credence to the notion that alternative splicing diversifies the proteome by rewiring intrinsically disordered regions, which are increasingly recognized to play important roles in the generation of biological function from protein sequences.

  20. f

    Data from: Proteogenomic Gene Structure Validation in the Pineapple Genome

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Apr 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Norazrin Ariffin; David Wells Newman; Michael G. Nelson; Ronan O’cualain; Simon J. Hubbard (2024). Proteogenomic Gene Structure Validation in the Pineapple Genome [Dataset]. http://doi.org/10.1021/acs.jproteome.3c00675.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 23, 2024
    Dataset provided by
    ACS Publications
    Authors
    Norazrin Ariffin; David Wells Newman; Michael G. Nelson; Ronan O’cualain; Simon J. Hubbard
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    MD2 pineapple (Ananas comosus) is the second most important tropical crop that preserves crassulacean acid metabolism (CAM), which has high water-use efficiency and is fast becoming the most consumed fresh fruit worldwide. Despite the significance of environmental efficiency and popularity, until very recently, its genome sequence has not been determined and a high-quality annotated proteome has not been available. Here, we have undertaken a pilot proteogenomic study, analyzing the proteome of MD2 pineapple leaves using liquid chromatography-mass spectrometry (LC–MS/MS), which validates 1781 predicted proteins in the annotated F153 (V3) genome. In addition, a further 603 peptide identifications are found that map exclusively to an independent MD2 transcriptome-derived database but are not found in the standard F153 (V3) annotated proteome. Peptide identifications derived from these MD2 transcripts are also cross-referenced to a more recent and complete MD2 genome annotation, resulting in 402 nonoverlapping peptides, which in turn support 30 high-quality gene candidates novel to both pineapple genomes. Many of the validated F153 (V3) genes are also supported by an independent proteomics data set collected for an ornamental pineapple variety. The contigs and peptides have been mapped to the current F153 genome build and are available as bed files to display a custom gene track on the Ensembl Plants region viewer. These analyses add to the knowledge of experimentally validated pineapple genes and demonstrate the utility of transcript-derived proteomics to discover both novel genes and genetic structure in a plant genome, adding value to its annotation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2022). Database of homology-derived secondary structure of proteins [Dataset]. https://bioregistry.io/hssp

Database of homology-derived secondary structure of proteins

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jan 16, 2022
Description

HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein.

Search
Clear search
Close search
Google apps
Main menu