Facebook
TwitterDatabase that aggregates exome and genome sequencing data from large-scale sequencing projects. The gnomAD data set contains individuals sequenced using multiple exome capture methods and sequencing chemistries. Raw data from the projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Protein aggregation occurs when misfolded or unfolded proteins physically bind together and can promote the development of various amyloid diseases. This study aimed to construct surrogate models for predicting protein aggregation via data-driven methods using two types of databases. First, an aggregation propensity score database was constructed by calculating the scores for protein structures in the Protein Data Bank using Aggrescan3D 2.0. Moreover, feature- and graph-based models for predicting protein aggregation have been developed by using this database. The graph-based model outperformed the feature-based model, resulting in an R2 of 0.95, although it intrinsically required protein structures. Second, for the experimental data, a feature-based model was built using the Curated Protein Aggregation Database 2.0 to predict the aggregated intensity curves. In summary, this study suggests approaches that are more effective in predicting protein aggregation, depending on the type of descriptor and the database.
Facebook
Twitterhttps://bioregistry.io/spdx:CC0-1.0https://bioregistry.io/spdx:CC0-1.0
The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community (from https://gnomad.broadinstitute.org).
Facebook
TwitterMetabolic pathway map that collects metabolic data gathered from multiple public databases and organizes them in one central location.
Facebook
Twitterhttps://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Aggregate Functions JOINS and SET operations of Database Management with SQL, 4th Semester , Bachelor in Business Administration (Hons.) 2023-2024
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-scale human sequencing projects Sign up for the gnomAD mailing list here. This dataset was derived from summary data from gnomAD release 3.1, available on the Registry of Open Data on AWS for ready enrollment into the Data Lake as Code.
Facebook
TwitterPython 2 Jupyter notebook that aggregates sub-daily time series observations up to a daily time scale. The code was originally written to aggregate data stored in the sqlite database stored in this resource: https://www.hydroshare.org/resource/9e1b23607ac240588ba50d6b5b9a49b5/
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Riverine silicon (Si) plays a vital role in governing primary production, water quality, and carbon sequestration. The Global Aggregation of Stream Silica (GlASS) database was constructed to assess changes in riverine Si concentrations and fluxes, their relationship to available nutrients, and to evaluate mechanisms driving these patterns. GlASS includes dissolved Si (DSi), dissolved inorganic nitrogen, and dissolved inorganic phosphorus concentrations at daily to quarterly time steps, daily discharge, and watershed characteristics for rivers with drainage areas ranging less than 1 square kilometer to more than 4 million square kilometers and spanning nine climate zones. Chemistry and discharge data range between years 1963 and 2024. Watershed and climate data range between 1948 and 2024. GlASS uses publicly available datasets, ensuring transparency and reproducibility. Original data sources are cited, data quality assurance workflows are public, and input files to a common load ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The database contains physical and microwave single scattering properties of horizontally aligned frozen hydrometeors as large as 11 cm in diameter.
A description of the aggregation model used for particle generation can be found in:
Leinonen, J., and Szyrmer, W. (2015), Radar signatures of snowflake riming: A modeling study, Earth and Space Science, 2, 346– 358, doi:10.1002/2015EA000102.
The code used for particle generation is freely available at: https://github.com/jleinonen/aggregation
The scattering properties of particles were computed using discrete dipole approximation using ADDA software package (https://github.com/adda-team/adda)
Terminal velocity of snowflakes was computed using 4 hydrodynamical models that were implemented as a part of snowScat library (https://github.com/OPTIMICe-team/snowScatt)
Approximately one half of the snowflake structure files and one quarter of scattering properties (for X, Ku, Ka and W band) were generated for the publication of Leinonen and Szyrmer (2015). The remaining part of the dataset was generated using the ALICE High Performance Computing Facility at the University of Leicester.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Advancements in phenotyping technology have enabled plant science researchers to gather large volumes of information from their experiments, especially those that evaluate multiple genotypes. To fully leverage these complex and often heterogeneous data sets (i.e. those that differ in format and structure), scientists must invest considerable time in data processing, and data management has emerged as a considerable barrier for downstream application. Here, we propose a pipeline to enhance data collection, processing, and management from plant science studies comprising of two newly developed open-source programs. The first, called AgTC, is a series of programming functions that generates comma-separated values file templates to collect data in a standard format using either a lab-based computer or a mobile device. The second series of functions, AgETL, executes steps for an Extract-Transform-Load (ETL) data integration process where data are extracted from heterogeneously formatted files, transformed to meet standard criteria, and loaded into a database. There, data are stored and can be accessed for data analysis-related processes, including dynamic data visualization through web-based tools. Both AgTC and AgETL are flexible for application across plant science experiments without programming knowledge on the part of the domain scientist, and their functions are executed on Jupyter Notebook, a browser-based interactive development environment. Additionally, all parameters are easily customized from central configuration files written in the human-readable YAML format. Using three experiments from research laboratories in university and non-government organization (NGO) settings as test cases, we demonstrate the utility of AgTC and AgETL to streamline critical steps from data collection to analysis in the plant sciences.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A database of physical and IR-spectroscopic characteristics (including nitrogen content and aggregation state) and stable isotopic compositions (d13C and d15N) of principally inclusion-bearing diamonds
Facebook
TwitterThe National Mortgage Database (NMDB®) is a nationally representative five percent sample of residential mortgages in the United States.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
基因组组装
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Aim: Addressing how woody plant species are distributed in space can reveal inconspicuous drivers that structure plant communities. The spatial structure of conspecifics varies not only at local scales across co-existing plant species but also at larger biogeographical scales with climatic parameters and habitat properties. The possibility that biogeographical drivers shape the spatial structure of plants, however, has not received sufficient attention. Location: Global synthesis. Time period: 1997 - 2022. Major taxa studied: Woody angiosperms and conifers. Methods: We carried out a quantitative synthesis to capture the interplay between local scale and larger scale drivers. We modelled conspecific spatial aggregation as a binary response through logistic models and Ripley’s L statistics and the distance at which the point process was least random with mixed effects linear models. Our predictors covered a range of plant traits, climatic predictors and descriptors of the habitat. Results: We hypothesized that plant traits, when summarized by local scale predictors, exceed in importance biogeographical drivers in determining the spatial structure of conspecifics across woody systems. This was only the case in relation to the frequency with which we observe aggregated distributions. The probability of observing spatial aggregation and the intensity of it was higher for plant species with large leaves but further depended on climatic parameters and mycorrhiza. Main Conclusions: Compared to climatic variables, plant traits perform poorly in explaining the spatial structure of woody plant species, even though leaf area is a decisive plant trait that is related to whether we observe homogenous spatial aggregation and its intensity. Despite the limited variance explained by our models, we found that the spatial structure of woody plants is subject to consistent biogeographical constraints and that these exceed beyond descriptors of individual species, which we captured here through leaf area. Methods On the 8th of September 2022 we carried out a search in the Web of Science with the search string “(Ripley's K function) AND (forest)”. The search yielded 356 hits. We screened those 356 studies for eligibility, first based on the suitability of their article titles and second based on their abstracts (Figure S1). The 240 eligible studies were subsequently screened manually upon reading the entire article based on the following inclusion criteria: (1) The study reported on univariate Ripley's K or L statistics or else it was possible to extract those from figures or maps. (2) The study had been carried out in a woody ecosystem or a rangeland. (3) The univariate Ripley’s K statistics described the distribution of individuals from a single plant species. (4) The authors named the plant species for which the univariate Ripley's K statistics had been described. (5) The landscape (for example a logging area) did not induce conspicuous point processes that could not be corrected within the analysis. We manually processed the remaining 240 studies through reading the main text which reduced the final number of eligible studies to 69. A list of those data sources can be found in Appendix Three. From those studies we extracted the following moderators and we fitted them as predictors in subsequent models: Mean annual temperature: continuous variable. When unreported, we extracted the variable based on coordinates from WorldClim (Fick & Hijmans, 2017). Total annual precipitation: continuous variable. When unreported, we extracted the variable based on coordinates from WorldClim (Fick & Hijmans, 2017). Latitude of the study location: continuous variable. When unreported, we extracted the information based on the closest location reported. Longitude of the study location: continuous variable. When unreported, we extracted the information based on the closest location reported. Site area: continuous variable. We extracted the site area from the studies and converted it into a unified unit, square meter. Tree species: categorical variable. Plant traits: we collected data on 7 traits: leaf area (i.e. the size of the leaves), seed mass, wood density, leaf mass per area, tree height, plant species biomass and stem specific density. We first gathered data on tree height, seed mass and leaf area from the subset of common species in TRY (Díaz et al., 2022). We subsequently searched for seed mass data the SID database (Royal Botanic Gardens Kew, 2023) and the ICRAF database for wood density data (Ketterings et al., 2001). In the cases we observed no records in those databases we checked the EOL database (http://eol.org.). For leaf area, leaf mass per area, tree height, plant species biomass and stem specific density, we extracted them from the EOL database (http://eol.org.). We opted with these traits to cover as many trait syndromes as possible but the main criterion which we used to decide on the traits was the feasibility of acquiring them for the plant species in our database. Woody system age: categorical variable. We classified non woody habitats, plantations and systems that had recently experienced serious disturbances as “young” whereas natural forests or woody stands that had reached maturity as “old”. Mycorrhiza type: categorical variable. We extracted mycorrhizal types for each species from Wang and Qiu (2006). In the cases that we could find no mycorrhizal classification information in the database at a species level we searched instead the database compiled by Delavaux et al. (2021) containing information at a genus level. We only extracted mycorrhizal classifications if these supported a single mycorrhizal type at a minimum probability of 85%. Otherwise, we left the plant species unclassified in relation to mycorrhiza. Ripley's L effect size: continuous variable. We first calculated for all distances the ratio between the (1) difference between the Ripley's L statistic and the width of the 95% CI envelope divided by two and (2) the difference between the upper and lower points of the envelope divided by two. A large absolute value suggests a strong deviation from randomness whereas any value below 1 suggest a random process. We identified the location where the absolute value of this ratio was maximum. Ripley's L statistic: continuous variable. We transformed Ripley´s K statistics (when they had not been transformed) into Ripley´s L statistics. We only used the value at the location where we observed the maximum in absolute value Ripley's K effect size. Distance when Ripley's L peaked: continuous variable describing the distance at which we observed the maximum in absolute value Ripley´s L effect size. Köppen climate zone: a categorical variable with 4 levels describing the main climatic zones based on the Köppen classification: A (tropical climates); B (arid climates); C (temperate climates); D (continental climates). We extracted those from the raster files published by Beck et al. (2018). In the cases that we observed multiple values in databases (referring here mainly to plant trait values) per species, we used the median value. In the cases when we had to digitize plots to extract data, we did so with Plot Digitizer v2.6.8.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This material is part of the free Environmental Performance in Construction (EPiC) Database. The EPiC Database contains embodied environmental flow coefficients for 250+ construction materials using a comprehensive hybrid life cycle inventory approach.Recycled aggregate is a cheap and readily available product made from recycled construction materials. It is strong and durable with excellent drainage properties. It is typically comprised of concrete, stone, brick, ceramics, mortar and other common construction materials. It is produced using the waste materials collected from the demolition of building and infrastructure projects. Impurities such as metal, wood and timber are removed via magnets and other sorting techniques. The remaining materials are sorted by size, and crushed into a coarse aggregate.Recycled aggregate is becoming increasingly popular as a replacement for natural aggregates. It is commonly used for: bulk fill, road construction, gravel, and as an aggregate in concrete. When used in concrete, it is typically combined with fly ash or other additives to ensure improved strength and reliability.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Mortgage Database (NMDB®) is a nationally representative five percent sample of residential mortgages in the United States. Publication of aggregate data from NMDB is a step toward implementing the statutory requirements of section 1324(c) of the Federal Housing Enterprises Financial Safety and Soundness Act of 1992, as amended by the Housing and Economic Recovery Act of 2008. The statute requires FHFA to conduct a monthly mortgage market survey to collect data on the characteristics of individual mortgages, both Enterprise and non-Enterprise, and to make the data available to the public while protecting the privacy of the borrowers.Notes:1) All CSV file headers are now standardized as described in the Data Dictionary and Technical Notes and all CSV files are zipped.2) Alternate wide format CSV files are available. The wide format may be more easily opened by MS Excel.
Facebook
TwitterThis dataset is a thematic reaggregated version of the original national Africover landcover multipurpose database. It contains all natural vegetation with a woody component. The original full resolution land cover has been produced from visual interpretation of digitally enhanced LANDSAT TM images (Bands 4,3,2) acquired mainly in the period 2000-2001 (see the "Multipurpose Landcover Database" metadata for more details). This dataset is intended for free public access. Thematic aggregation is the way that the end user customizes the Africover database to fulfil his/her specific requirements. The Africover database gives equal level of detail to Agriculture as well as Natural vegetation or Bare Areas etc. Generally a single user does not need this level of detail for each class type; therefore he/she will enhance the information of one land cover type and will generalize or erase the information related to other land cover aspects. The most powerful way to conduct an aggregation exercise is to use the classifiers as basic elements of the exercise. This gives the user the maximum flexibility on the use of data. The shape main attributes correspond to the following fields: -ID -HECTARES -WOODY_ID -WOODY_DESC You can download a zip archive containing: -the drc-cult-agg (.shp) -the DR Congo Classifiers Used (.pdf) -the DR Congo legend (.pdf and .xls) -the DR Congo Legend - LCCS Import file (.xls) -the LCCSglossary_drcongo (.pdf) -the thematic-aggregation-procedure (.pdf) -the thematic-aggregation-annex1 (.pdf) -the thematic-aggregation-annex2 (.pdf) -the Userlabel Definitions (.pdf)
Facebook
TwitterIn order to estimate the climate impact of highly absorbing black carbon (BC) aerosols, it is necessary to know their optical properties. The Lorentz-Mie theory, often used to calculate the optical properties of BC under the spherical morphological assumption, produces discrepancies when compared to measurements. In light of this, researchers are currently investigating the possibility of computing the optical properties of BC using a realistic fractal aggregate morphology. To determine the optical properties of such BC fractal aggregates, the Multiple Sphere T-Matrix method (MSTM) is used, which can take more than 24 hours for a single simulation depending on the aggregate properties. This study provides a highly accurate benchmark machine-learning algorithm that can be used to generate the optical properties of BC fractal aggregate in a fraction of a second. The machine learning algorithm was trained over an extensive database of physicochemical and optical properties of BC fractal aggregates. The extensive training data helped develop an ML algorithm that can accurately predict the optical properties of BC fractal aggregates with an average deviation of less than one percent from their actual values. Specifically, the ML algorithm provides the option to generate the optical properties in the visible spectrum using either kernel ridge regression (KRR) or artificial neural networks (ANN) for a BC fractal aggregate of desired physicochemical properties like size, morphology, and organic coating. The dataset of physicochemical and optical properties of BC fractal aggregates are provided here. The developed ML algorithm for predicting the optical properties of BC fractal aggregates (https://github.com/jaikrishnap/Machine-learning-for-prediction-of-BCFAs) is highly useful for real-world applications due to its wide parameter range, high accuracy, and low computational cost.
Contents
database_optical_properties_black_carbon_fractal_aggregtates.csv, data file, comma-separated values
database_header.txt, metadata, text
Citation for the database:
B., Romshoo, T., Müller, B., Patil, J., Michels, T., Kloft, M., and Pöhlker, M.: Database of physicochemical and optical properties of black carbon fractal aggregates, Dataset, https://doi.org/10.5281/zenodo.7523058, 2023.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Ulcerative colitis (UC) is a chronic, complicated, inflammatory disease with an increasing incidence and prevalence worldwide. However, the intrinsic molecular mechanisms underlying the pathogenesis of UC have not yet been fully elucidated.Methods: All UC datasets published in the GEO database were analyzed and summarized. Subsequently, the robust rank aggregation (RRA) method was used to identify differentially expressed genes (DEGs) between UC patients and controls. Gene functional annotation and PPI network analysis were performed to illustrate the potential functions of the DEGs. Some important functional modules from the protein-protein interaction (PPI) network were identified by molecular complex detection (MCODE), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG), and analyses were performed. The results of CytoHubba, a plug for integrated algorithm for biomolecular interaction networks combined with RRA analysis, were used to identify the hub genes. Finally, a mouse model of UC was established by dextran sulfate sodium salt (DSS) solution to verify the expression of hub genes.Results: A total of 6 datasets met the inclusion criteria (GSE38713, GSE59071, GSE73661, GSE75214, GSE87466, GSE92415). The RRA integrated analysis revealed 208 significant DEGs (132 upregulated genes and 76 downregulated genes). After constructing the PPI network by MCODE plug, modules with the top three scores were listed. The CytoHubba app and RRA identified six hub genes: LCN2, CXCL1, MMP3, IDO1, MMP1, and S100A8. We found through enrichment analysis that these functional modules and hub genes were mainly related to cytokine secretion, immune response, and cancer progression. With the mouse model, we found that the expression of all six hub genes in the UC group was higher than that in the control group (P < 0.05).Conclusion: The hub genes analyzed by the RRA method are highly reliable. These findings improve the understanding of the molecular mechanisms in UC pathogenesis.
Facebook
TwitterThe Species of Greatest Conservation Need National Database is an aggregation of lists from State Wildlife Action Plans. Species of Greatest Conservation Need (SGCN) are wildlife species that need conservation attention as listed in action plans. In this database, we have validated scientific names from original documents against taxonomic authorities to increase consistency among names enabling aggregation and summary. This database does not replace the information contained in the original State Wildlife Action Plans. The database includes SGCN lists from 56 states, territories, and districts, encompassing action plans spanning from 2005 to 2022. State Wildlife Action Plans undergo updates at least once every 10 years by respective wildlife agencies. The SGCN list data from these action plans have been compiled in partnership with individual wildlife management agencies, the United States Fish and Wildlife Service, and the Association of Fish and Wildlife Agencies. The SGCN National Database consists of three data tables: "source_data", "process_data", and "validated_data". Most users will likely find the "sgcn_species_all_records" table that combines all three tables most useful to compare "source_" names and "validated_" names and to aggregate and summarize using validated names. The "source_data" table provides an archive of all SGCN records listed by conservation authorities over multiple actions plans, which includes the scientific names, common names, locations, and year of action plan. The "process_data" table incorporates processing information, including the archiving and processing dates along with persistent identifiers used for record documentation, while the "validated_data" table provides the taxonomic identities from the matched taxonomic source, including the standardized scientific name, common name, and taxonomic ranks as well as links to supplementary taxonomic information.
Facebook
TwitterDatabase that aggregates exome and genome sequencing data from large-scale sequencing projects. The gnomAD data set contains individuals sequenced using multiple exome capture methods and sequencing chemistries. Raw data from the projects have been reprocessed through the same pipeline, and jointly variant-called to increase consistency across projects.