Facebook
TwitterHSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein.
Facebook
TwitterThe data presented in this data release represent observations of postfire debris flows that have been collected from publicly available datasets. Data originate from 13 different countries: the United States, Australia, China, Italy, Greece, Portugal, Spain, the United Kingdom, Austria, Switzerland, Canada, South Korea, and Japan. The data are located in the file called “PFDF_database_sortedbyReference.txt” and a description of each column header can be found in both the file “column_headers.txt” and the metadata file (“Post-fire Debris-Flow Database (Literature Derived).xml”). The observations are derived from areas that have been burned by wildfire and are global in nature. However, this dataset is synthesized from information collected by many different researchers for different purposes, and therefore not all fields are available for each of the observations. Missing information is indicated by the value “-9999” in the ”PFDF_database_sortedbyReference.txt” file. Note that the text file contains special characters and a mix of date-time formats that reflect the original data provided by the authors. The text may not be displayed correctly if it is opened by proprietary software such as Microsoft Excel but will appear correctly when opened in a text editor software.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This is a compilation of published rotational parameters derived from lightcurve data for asteroids, based on the Warner et al. (2009) Asteroid Lightcurve Database. This is the version released in March 2012. In addition to reported rotational parameters by individual paper, there is a summary file with the values adopted by Harris, Warner, and Pravec as the most likely correct values for each asteroid. The data set also contains files listing known binary asteroids and 'tumbling' asteroids.
Facebook
TwitterThe World Inventory of Soil Emission Potentials (WISE) database currently contains data for over 4300 soil profiles collected mostly between 1950 and 1995. This database has been used to generate a series of uniform data sets of derived soil properties for each of the 106 soil units considered in the Soil Map of the World (FAO-UNESCO, 1974). These data sets were then linked to a 1/2 degree longitude by 1/2 degree latitude version of the edited and digital Soil Map of the World (FAO, 1995) to generate GIS raster image files for the following variables: Total available water capacity (mm water per 1 m soil depth) soil organic carbon density (kg C/m2 for 0-30cm depth range) soil organic carbon density (kg C/m2 for 0-100cm depth range) soil carbonate carbon density (kg C/m**2 for 0-100cm depth range) soil pH (0-30 cm depth range) soil pH (30-100 cm depth range) Data Citation: The data set should be cited as follows: Batjes, N. H. (ed). 2000. Global Data Set of Derived Soil Properties, 0.5-Degree Grid (ISRIC-WISE). Available on-line from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.
Facebook
TwitterSheet “Prevalence, Incidence”–Prevalent and incident patient numbers. Sheet “Demographics”–Patient counts in age groups, median, mean and standard deviation of age. Sheet “Deaths”–Raw death counts and median age at death. Sheet “Malignancies distribution”–Patient counts with malignancy diagnoses based on 3-digit ICD-10 code. Sheet “CRC epidemiology”–Patient counts with new CRC diagnoses, median age at CRC diagnosis and death, total death counts. Sheet “Survival1”–Survival curve data, OS, UC patients and controls. Sheet “Survival2”–Survival curve data, OS, CRC-UC patients, whole group and age stratified. (XLSX)
Facebook
TwitterProteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Facebook
TwitterA text mining derived database with focus on extracting and classifying gene-disease associations with respect to several biomolecular conditions. It uses a machine learning based algorithm to extract semantic gene-disease relations from a textual source of interest. The semantic gene-disease relations were extracted with F-measures of 78. More specifically, the textual source utilized here originates from Entrez Gene''''s GeneRIF (Gene Reference Into Function) database (Mitchell, et al., 2003). LHGDN was created based on a GeneRIF version from March 31st, 2009, consisting of 414241 phrases. These phrases were further restricted to the organism Homo sapiens, which resulted in a total of 178004 phrases. We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph. We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Combined data set derived from new data generated herein and publicly available DNA barcoding projects from the Barcode of Life Database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Plants produce a wide range of bioactive peptides as part of their innate defense mechanisms. With the explosive growth of plant-derived peptides, verifying the therapeutic function using traditional experimental methods are resources and time consuming. Therefore, it is necessary to predict the therapeutic function of plant-derived peptides more effectively and accurately with reduced waste of resources and thus expedite the development of plant peptides. We herein developed a repository of plant peptides predicted to have multiple therapeutic functions, named as MFPPDB (multi-functional plant peptide database). MFPPDB including 1,482,409 single or multiple functional plant origin therapeutic peptides derived from 121 fundamental plant species. The functional categories of these therapeutic peptides include 41 different features such as anti-bacterial, anti-fungal, anti-HIV, anti-viral, and anti-cancer. The detailed physicochemical information of these peptides was presented in functional search and physicochemical property search module, which can help users easily access the peptide information by the plant peptide species, ID, and functions, or by their peptide ID, isoelectric point, peptide sequence, and molecular weight through web-friendly interface. We further matched the predicted peptides to nine state-of-the-art curated functional peptide databases and found that at least 293,408 of the peptides possess functional potentials. Overall, MFPPDB integrated a massive number of plant peptides have single or multiple therapeutic functions, which will facilitate the comprehensive research in plant peptidomics. MFPPDB can be freely accessed through http://124.223.195.214:9188/mfppdb/index.
Facebook
TwitterWorking definition derived from the NHIS claims database.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This bundle contains derived IOF data from the Panoramic Cameras (Pancam) on Mars Exploration Rover 2 (Spirit). These data were produced by the science team.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Release of a BridgeDb gene identifier mapping database between Wikidata and Ensembl.
[INFO]: Database finished. INFO: old database is Wikidata 1.0.0 (build: 20230506) INFO: new database is Wikidata 1.0.0 (build: 20230506) INFO: Number of ids in Wd (Wikidata): 153715 (unchanged) INFO: Number of ids in En (Ensembl): 153637 (unchanged) INFO: new size is 95 Mb (changed +0.0%) INFO: total number of identifiers is 307352 INFO: total number of mappings is 307430
Facebook
TwitterThe dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.
Superseded by HUN AssetList v1.3 20150212 (GUID: dcf8349e-aaed-4d30-80ab-1c8cbad8fe68) on 2/12/2015
This dataset contains the spatial and non-spatial (attribute) components of the Hunter subregion Asset List as an .mdb file, which is readable as an MS Access database or as an ESRI Personal Geodatabase.
Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.
Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.
Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "AnR_database_HUN_v1p2_20150128.doc", located in the zip file as part of this dataset.
The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.
Detailed information describing the database structure and content can be found in the document "AnR_database_HUN_v1p2_20150128.doc" located in the zip file.
Some of the source data used in the compilation of this dataset is restricted.
The Asset List Database was developed to identify water dependent assets located within the Hunter subregion.
Superseded by HUN AssetList v1.3 20150212 (GUID: dcf8349e-aaed-4d30-80ab-1c8cbad8fe68) on 2/12/2015*****
This dataset is an update of the previous version of the Hunter asset list database: "Asset list for Hunter - CURRENT"; ID: 51b1e021-2958-4cd3-8daa-ba46ece09d1c, which was updated with the inclusion of data from NSW Department of Primary Industries - Office of Water: HIGH PROBABILITY GROUNDWATER DEPENDENT VEGETATION WITH HIGH ECOLOGICAL VALUE (Hunter-Central Rivers).
Bioregional Assessment Programme (2015) HUN AssetList Database v1p2 20150128. Bioregional Assessment Derived Dataset. Viewed 09 October 2018, http://data.bioregionalassessments.gov.au/dataset/64ecd565-bb7c-4f21-951e-f35966b91c99.
Derived From NSW Office of Water Surface Water Entitlements Locations v1_Oct2013
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas
Derived From Birds Australia - Important Bird Areas (IBA) 2009
Derived From Hunter CMA GDEs (DRAFT DPI pre-release)
Derived From NSW Office of Water Surface Water Licences Processed for Hunter v1 20140516
Derived From NSW Office of Water Surface Water Offtakes - Hunter v1 24102013
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From Asset list for Hunter - CURRENT
Derived From Ramsar Wetlands of Australia
Derived From Commonwealth Heritage List Spatial Database (CHL)
Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514
Derived From New South Wales NSW Regional CMA Water Asset Information WAIT tool databases, RESTRICTED Includes ALL Reports
Derived From National Heritage List Spatial Database (NHL) (v2.1)
Derived From Groundwater Entitlement Hunter NSW Office of Water 20150324
Derived From NSW Office of Water combined geodatabase of regulated rivers and water sharing plan regions
Derived From Australia World Heritage Areas
Derived From NSW Office of Water GW licence extract linked to spatial locations for NorthandSouthSydney v3 13032014
Derived From Groundwater Economic Elements Hunter NSW 20150520 PersRem v02
Derived From Directory of Important Wetlands in Australia (DIWA) Spatial Database (Public)
Derived From New South Wales NSW - Regional - CMA - Water Asset Information Tool - WAIT - databases
Derived From Operating Mines OZMIN Geoscience Australia 20150201
Derived From NSW Office of Water - National Groundwater Information System 20141101v02
Derived From Groundwater Economic Assets Hunter NSW 20150331 PersRem
Derived From Australia - Species of National Environmental Significance Database
Derived From Monitoring Power Generation and Water Supply Bores Hunter NOW 20150514
Derived From Northern Rivers CMA GDEs (DRAFT DPI pre-release)
Derived From Australia, Register of the National Estate (RNE) - Spatial Database (RNESDB) Internal
Derived From NSW Office of Water Groundwater Entitlements Spatial Locations
Derived From NSW Office of Water Groundwater Licence Extract, North and South Sydney - Oct 2013
Derived From NSW Office of Water - GW licence extract linked to spatial locations for North and South Sydney v2 20140228
Derived From Collaborative Australian Protected Areas Database (CAPAD) 2010 (Not current release)
Facebook
TwitterThe Maranoa-Balonne-Condamine Impact and Risk Analysis Database (Analysis Database) is a fit-for-purpose geospatial information system developed for the Impact and Risk Analysis (Component 3-4) products of the Bioregional Assessment Technical Programme (BATP).
The version provided here for public download has been slightly modified to remove restricted material such as the co-ordinates of protected or threatened species. This version was used to populate BA Explorer.
The Analysis Database brings together many of the data sets used in Components 1 and 2 of the assessments and includes hydrology and hydrogeology modelling results, landscape classes and economic, sociocultural and ecological assets. These data sets are listed in the Component 1 and 2 products under the Assessments tab in http://www.bioregionalassessments.gov.au/.
An Analysis Database of common design and schema was implemented for each subregion where a full Impact and Risk Analysis was completed. To populate each database, input datasets were transformed, normalised and inserted into their respective Analysis Databases in accord with the common design and schema. The approach enabled the universal treatment of data analysis across all bioregions despite data being of different specifications and origins.
The Analysis Database includes all the data used for the assessment of the subregion with the exception of those datasets that were not provided to the program with an open access licence. The database is constructed using the Open Source platform PostgreSQL coupled with PostGIS. This technology was considered to better enable the provenance and transparency requirements of the Programme. The files provided here have been prepared using the PostgreSQL version 9.5 SQL Dump function - pg_dump.
A detailed description of the Analysis Database, its design, structure and application is provided in the supporting documentation: http://data.bioregionalassessments.gov.au/dataset/05e851cf-57a5-4127-948a-1b41732d538c
The Maranoa-Balonne-Condamine Impact and Risk Analysis Database (Analysis Database) is the geospatial database for completing the Impact and Risk Analysis component of the Maranoa-Balonne-Condamine Bioregional Assessment. This includes the creating of results, tables and maps that appear in the relevant Products of each assessment. The database also manages the data used by the BA Explorer.
An individual instance of the Analysis Database was developed for each subregion where a component 3-4 Impact and Risks Assessment was conducted. With the exception of the subregion-specific data contained within it and the removal of restricted data records, each analysis database is of identical design and structure.
This Analysis Database is an instance of PostgreSQL version 9.5 hosted on Linux Red Hat Enterprise Linux version 4.8.5-4. PostgreSQL geospatial capabilities are provided by POSTGIS version 2.2.
Data pre-processing and upload into each PostgreSQL database was completed using FME Desktop (Oracle Edition) version 2016.1.2.1. Analysis data and results are provided to users and systems via the geospatial services of Geoserver version 2.9.1. Scientific analysis and mapping was undertaken by connecting a range of data using a combination of Microsoft Excel, QGIS and ArcMap systems.
During the Programme and for its working life, the Analysis Database was hosted and managed on instances of Amazon Web Services managed by Geoscience Australia and the Bureau of Meteorology.
Bioregional Assessment Programme (2017) MBC Impact and Risk Analysis Database v01. Bioregional Assessment Derived Dataset. Viewed 25 October 2017, http://data.bioregionalassessments.gov.au/dataset/69075f3e-67ba-405b-8640-96e6cb2a189a.
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204
Derived From Surface Geology of Australia, 1:1 000 000 scale, 2012 edition
Derived From Asset database for the Maranoa-Balonne-Condamine subregion on 16 June 2015
Derived From South East Queensland GDE (draft)
Derived From Geofabric Surface Cartography - V2.1
Derived From Environmental Asset Database - Commonwealth Environmental Water Office
Derived From QLD Dept of Natural Resources and Mines, Surface Water Entitlements 131204
Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)
Derived From Catchment Scale Land Use of Australia - 2014
Derived From Surface water preliminary assessment extent for the Maranoa-Balonne-Condamine subregion - v02
Derived From MBC Groundwater model domain boundary
Derived From Key Environmental Assets - KEA - of the Murray Darling Basin
Derived From Bioregional Assessment areas v03
Derived From MBC Groundwater model ACRD 5th to 95th percentile drawdown
Derived From Permanent and Semi-Permanent Waterbodies of the Lake Eyre Basin (Queensland and South Australia) (DRAFT)
Derived From Receptors for the Maranoa-Balonne-Condamine subregion
Derived From Bioregional Assessment areas v01
Derived From Bioregional Assessment areas v02
Derived From MBC Assessment Units 20160714 v01
Derived From Victoria - Seamless Geology 2014
Derived From Matters of State environmental significance (version 4.1), Queensland
Derived From Communities of National Environmental Significance Database - RESTRICTED - Metadata only
Derived From Bioregional Assessment areas v06
Derived From Asset database for the Maranoa-Balonne-Condamine subregion on 9 June 2015
Derived From Queensland wetland data version 3 - wetland areas.
Derived From Groundwater Preliminary Assessment Extent (PAE) for the Maranoa Balonne Condamine (MBC) subregion - v02
Derived From National Groundwater Dependent Ecosystems (GDE) Atlas (including WA)
Derived From Asset database for the Maranoa-Balonne-Condamine subregion on 05 February 2016
Derived From MBC Groundwater model layer boundaries
Derived From NSW Catchment Management Authority Boundaries 20130917
Derived From Baseline drawdown Layer 1 - Condamine Alluvium
Derived From MBC Assessment unit codified by regional watertable
Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores and NGIS v4 28072014
Derived From MBC Assessment Units 20160714 v02
Derived From MBC Groundwater model water balance areas
Derived From Asset database for the Maranoa-Balonne-Condamine subregion on 25 February 2015
Derived From Australia - Species of National Environmental Significance Database
Derived From MBC Groundwater model uncertainty analysis
Derived From Spring vents assessed for the Surat Underground Water Impact Report 2012
Derived From Collaborative Australian Protected Areas Database (CAPAD) 2010 (Not current release)
Derived From Queensland QLD - Regional - NRM - Water Asset Information Tool - WAIT - databases
Derived From [NSW Office of Water GW licence extract linked to spatial
Facebook
TwitterDuring the 1989 FIFE field campaign, measurements were made of soil moisture release parameters and hydraulic conductivity. Bulk density and soil moisture release data were collected at five FIFE sites representing the major soil types in the FIFE study area. These data were used to model the porosity, saturated water potential, and the b-factor (the exponent of the power curve function) following the method of Clapp and Hormberger (1978). These soil moisture characteristics can be used to describe plant-available water and water movement through soils.
Facebook
TwitterThis is a compilation of published rotational parameters derived from lightcurve data for asteroids, based on the Warner et al. (2009) Asteroid Lightcurve Database. This is the version as of March 1, 2014. In addition to reported rotational parameters by individual paper, there is a summary file with the values adopted by Harris, Warner, and Pravec as the most likely correct values for each asteroid. The data set also contains files listing known binary asteroids and 'tumbling' asteroids.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BIOSYSMOdb is a comprehensive and integrative database developed as part of BIOSYSMO project. This resource centralizes data on metabolic pathways, reactions, enzymes, and degradative organisms to address soil contamination caused by industrial, agricultural, and urban activities. BIOSYSMOdb serves as a bridge between computational and experimental research, offering a unified platform to accelerate bioremediation solutions. Dataset Description BIOSYSMOdb integrates curated and synthesized data from major public repositories: EAWAG BBD, MibPOPdb, MetaCyc, Uniprot, and KEGG. The database includes: Chemical level: Details on compounds relevant for biodegradation. Metabolic level: Data on pathways, reactions, enzymes, and organisms associated with degradation. Organism level: Information on degradative organisms and their genomic data. Protein level: Information on enzymes in charge of each reaction and their sequence data associates (if available) Data Structure The following files are included in the dataset: BIOSYSMOdb_Compounds_chemical_iden_v1.0.csv: Compounds identifiers iferred for other databases BIOSYSMOdb_Compounds_chemical_info_v1.0.csv: Compounds information collected from public sources BIOSYSMOdb_Compounds_onthology_cod_v1.0.csv: Compounds onthology codes derived from Classyfire BIOSYSMOdb_Compounds_onthology_term_v1.0.csv: Compounds onthology terms derived from Classyfire BIOSYSMOdb_Pathways_v1.0.csv: Pathways dataset BIOSYSMOdb_Reactions_v1.0.csv: Reactions dataset (containing substrates, products, enzymes and pathways associated) BIOSYSMOdb_Enzymes_v1.0.csv: Reactions dataset (containing reactions associated) BIOSYSMOdb_Compounds_v1.0.csv: Compounds principal dataset BIOSYSMOdb_Organisms_v1.0.csv: Organisms principal dataset (containing pathways associated and NCBI Genome ID when available) CSV Descriptions Compound ID: Unique identifier for each compound. Pathway Name: Name of the metabolic pathway. Reaction ID: Identifier for individual reactions. Enzyme/Protein ID: Unique identifier for associated enzymes. Organism Name: Name of the degradative organism. Jupyter Notebook for querying BIOSYSMOdbTo facilitate data exploration and connections within the CSV files, a Jupyter Notebook, BIOSYSMO_database_queries, has been created. This notebook enables users to analyze relationships between different datasets and execute relevant queries efficiently. Data Sources & Licenses This database includes data derived from diverse databases: EAWAG BBD: Data on biodegradation of persistent organic pollutants. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) MibPOPdb: Focused on microbial degradation of xenobiotics. Creative Commons Attribution 4.0 International (CC BY 4.0) license. MetaCyc: Comprehensive metabolic pathway database. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) KEGG: Genomic integration and metabolic networks. Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) UniProt: Protein sequences. Creative Commons Attribution 4.0 International (CC BY 4.0) license. NCBI Genome: Organism Genomes. This database is public. Pubchem: Chemical Compounds. this database is public. CHebi: Chemical Compounds. Creative Commons Attribution 4.0 International (CC BY 4.0) license. Licensing and Attribution This dataset is shared under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). Please credit BIOSYSMOdb and the original sources (EAWAG BBD, MibPOPdb, MetaCyc, ChEBI, Pubchem and KEGG) in any use or derivative works. BIOSYSMOdb was developed as part of the BIOSYSMO project, which has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101060211. Acknowledgments- MetaCyc, KEGG, EAWAG BBD, UniProt, NCBI Genome, PubChem, ChEBI, and MibPOPdb – For providing essential data that supported the curation of BIOSYSMOdb.- BIOSYSMO consortium – For their contributions to the database’s design and development. We extend our gratitude to the Horizon Europe programme and the European Union for their support in advancing research on bioremediation and biodegradation. Contact For inquiries, please contact: - Contact Name: Main Researcher: Marta Franco de Benito, MsC or Project Coordinator: Sara Gil Guerrero, PhD - Email: marta.franco@idener.ai // sara.gil@idener.ai - Institution: IDENER.AI
Facebook
TwitterCSB.DB presents the results of bio-statistical analysis on gene expression data in association with additional biochemical and physiological knowledge. The main aim of this database platform is to provide tools that support insight into life''s complexity pyramid with a special focus on the integration of data from transcript and metabolite profiling experiments. The main focus of the CSB project is the generation of new easily accessible knowledge about the relationship and the hierarchy of cellular components. Thus new progress towards understanding lifes complexity pyramid is made. For this aim statistical and computational algorithms are applied to organism specific data derived from publicly available multi-parallel technologies, currently such as expression profiles. The underlying data are derived from various research activities. Thus CSB project provides an integrated and centralized public resource allowing universal access on the generated knowledge CSB.DB: A Comprehensive Systems-Biology Database. The derived knowledge should support the formulation of new hypotheses about the respective functional involvement of genes beyond their (inter-) relationships. Another major goal of the CSB project is to supply the researchers with necessary information to formulate these new hypotheses without demanding any a-priori statistical knowledge of the user. The CSB project mainly focuses on application of required statistical tests as well as to assist the user during exploration of results with information / help files to support hypothesis generation
Facebook
TwitterA computational analysis of mass spectrometry data was performed to uncover alternative splicing derived protein variants across chambers of the human heart. Evidence for 216 non-canonical isoforms was apparent in the atrium and the ventricle, including 52 isoforms not documented on SwissProt and recovered using an RNA sequencing derived database. Among non-canonical isoforms, 29 show signs of regulation based on statistically significant preferences in tissue usage, including a ventricular enriched protein isoform of tensin-1 (TNS1) and an atrium-enriched PDZ and LIM Domain 3 (PDLIM3) isoform 2 (PDLIM3-2/ALP-H). Examined variant regions that differ between alternative and canonical isoforms are highly enriched with intrinsically disordered regions. Moreover, over two-thirds of such regions are predicted to function in protein binding and RNA binding. The analysis here lends further credence to the notion that alternative splicing diversifies the proteome by rewiring intrinsically disordered regions, which are increasingly recognized to play important roles in the generation of biological function from protein sequences.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
MD2 pineapple (Ananas comosus) is the second most important tropical crop that preserves crassulacean acid metabolism (CAM), which has high water-use efficiency and is fast becoming the most consumed fresh fruit worldwide. Despite the significance of environmental efficiency and popularity, until very recently, its genome sequence has not been determined and a high-quality annotated proteome has not been available. Here, we have undertaken a pilot proteogenomic study, analyzing the proteome of MD2 pineapple leaves using liquid chromatography-mass spectrometry (LC–MS/MS), which validates 1781 predicted proteins in the annotated F153 (V3) genome. In addition, a further 603 peptide identifications are found that map exclusively to an independent MD2 transcriptome-derived database but are not found in the standard F153 (V3) annotated proteome. Peptide identifications derived from these MD2 transcripts are also cross-referenced to a more recent and complete MD2 genome annotation, resulting in 402 nonoverlapping peptides, which in turn support 30 high-quality gene candidates novel to both pineapple genomes. Many of the validated F153 (V3) genes are also supported by an independent proteomics data set collected for an ornamental pineapple variety. The contigs and peptides have been mapped to the current F153 genome build and are available as bed files to display a custom gene track on the Ensembl Plants region viewer. These analyses add to the knowledge of experimentally validated pineapple genes and demonstrate the utility of transcript-derived proteomics to discover both novel genes and genetic structure in a plant genome, adding value to its annotation.
Facebook
TwitterHSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein.