Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Qualitative food frequency questionnaires (Q-FFQ) omit portion size information from dietary assessment. This restricts researchers to consumption frequency data, limiting investigations of dietary composition (i.e., energy-adjusted intakes) and misreporting. To support such researchers, we provide an instructive example of Q-FFQ energy intake estimation that derives typical portion size information from a reference survey population and evaluates misreporting. A sample of 1,919 Childhood Determinants of Adult Health Study (CDAH) participants aged 26–36 years completed a 127-item Q-FFQ. We assumed sex-specific portion sizes for Q-FFQ items using 24-h dietary recall data from the 2011–2012 Australian National Nutrition and Physical Activity Survey (NNPAS) and compiled energy density values primarily using the Australian Food Composition Database. Total energy intake estimation was daily equivalent frequency × portion size (g) × energy density (kJ/g) for each Q-FFQ item, summed. We benchmarked energy intake estimates against a weighted sample of age-matched NNPAS respondents (n = 1,383). Median (interquartile range) energy intake was 9,400 (7,580–11,969) kJ/day in CDAH and 9,055 (6,916–11,825) kJ/day in weighted NNPAS. Median energy intake to basal metabolic rate ratios were 1.43 (1.15–1.78) in CDAH and 1.35 (1.03–1.74) in weighted NNPAS, indicating notable underreporting in both samples, with increased levels of underreporting among the overweight and obese. Using the Goldberg and predicted total energy expenditure methods for classifying misreporting, 65 and 41% of CDAH participants had acceptable/plausible energy intake estimates, respectively. Excluding suspected CDAH misreporters improved the plausibility of energy intake estimates, concordant with expected body weight associations. This process can assist researchers wanting an estimate of energy intake from a Q-FFQ and to evaluate misreporting, broadening the scope of diet–disease investigations that depend on consumption frequency data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Deposited are three data sets containing compounds with multi- or single-target activity, which were assembled from the PubChem BioAssay database for promiscuity predictions [1]. The design and composition of these data sets are described in the original publication [1] and a forthcoming data note detailing the deposition. A brief summary of the data structure is provided in the readme.txt file accompanying the data sets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains two core asset types: Data Files and Model Files. 1. Data Files The dataset is provided in two separate .xlsx files: Raw-nir-spectra-data: This file contains the raw near-infrared spectral dataset. It records the spectral information for all 347 tobacco samples and includes metadata such as each sample's unique ID, cultivation year, and country of origin. 13-Chemical-Components-data: This file contains the reference dataset for the chemical constituents. It includes the quantitative analysis results for the 13 key chemical components for all 347 samples, corresponding one-to-one with the spectral data. 2. Model Files The database provides 99 pre-trained prediction and classification models in .joblib format. All models were built in a Python 3.9 environment and can be loaded and called directly. To facilitate easy identification and use, the model files adhere to the following naming conventions: A. Quantitative Models (Chemical Prediction) This naming format is used for the quantitative prediction models of the 13 chemical constituents. Format: [Chemical_Component]_[Preprocessing_Method]_[Modeling_Method].joblib Example: TotalSugars_MSC_PLS.joblib represents a PLS model for predicting Total Sugars using MSC preprocessing. B. Classification Models (Origin Prediction) This naming format is used for classification models built with different types of input data. Format (based on spectral data): [Preprocessing_Method]_[Modeling_Method].joblib Example: SecondDerivative_RF.joblib represents a Random Forest (RF) classification model built using second-derivative spectral data. Special Note: The file Thirteen_chemical_components-RF.joblib is a special classification model. It does not use spectral data; instead, it is built using the quantitative results of the 13 chemical components directly as its input features.
Facebook
TwitterThis database has been created to collect and distribute both historic and newly developed datasets for utilization in other work packages of the LIFECO programme: LInking hydrographic Frontal activity to ECOsystem dynamics in the North Sea and Skagerrak - importance to fish stock recruitment.
This work package will: 1) build and maintain a database site with consistent and validated data 2) define a data exchange system 3) develop a data request system 4) make a connection between the database, the GIS tools (WP8) and other work packages
The data collected within the project consists of many different spatial and temporal locations and periods including data from historical databases. The data comprises hydrological data, water column characteristics such as salinity, temperature and nutrient concentrations as well as, plankton and fish distribution and diet composition data from field programs and 3-D modelling exercises. These data will be processed using mathematical models in order to produce spatial time slices of oceanographic regimes.
All data will be validated prior to inclusion in the database.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📘 Description
This dataset is derived from the USDA FoodData Central – Foundation Foods (April 2025 Release). It provides detailed nutrient composition data for a wide range of commonly consumed foods, focusing on accurate, research-based nutrient profiles.
The Foundation Foods database is developed and maintained by the U.S. Department of Agriculture (USDA). It contains carefully curated information about food items, including values for macronutrients (protein, fat, carbohydrates) and micronutrients (vitamins and minerals). Each food entry is scientifically analyzed to support nutritional research, dietary assessment, and food composition studies.
📊 Data Overview
The dataset includes several linked tables (CSV files) such as:
food.csv – basic food descriptions and identifiers
nutrient.csv – list of nutrients with names, units, and IDs
food_nutrient.csv – nutrient values for each food item
food_category.csv – food group and classification
These can be joined using common keys like fdc_id or nutrient_id.
🔍 Example Use Cases
Exploratory Data Analysis (EDA) of nutrient distributions
Building recommendation systems for healthy diets
Predicting calorie or nutrient content from food descriptions
Visualizing macronutrient ratios across food categories
Training ML models for diet planning and nutrition tracking
📅 Version
Release Date: April 2025
Source: USDA FoodData Central – Foundation Foods
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository described a database used to predict bone mineral content of the whole-body based on the bone mineral content of head of pigs collected at slaughter. A description of the method and the obtained equation is provided in the word document and the database used in the Excel file.
Facebook
TwitterDuring hydrocarbon production, water is typically co-produced from the geologic formations producing oil and gas. Understanding the composition of these produced waters is important to help investigate the regional hydrogeology, the source of the water, the efficacy of water treatment and disposal plans, potential economic benefits of mineral commodities in the fluids, and the safety of potential sources of drinking or agricultural water. In addition to waters co-produced with hydrocarbons, geothermal development or exploration brings deep formation waters to the surface for possible sampling. This U.S. Geological Survey (USGS) Produced Waters Geochemical Database, which contains geochemical and other information for 114,943 produced water and other deep formation water samples of the United States, is a provisional, updated version of the 2002 USGS Produced Waters Database (Breit and others, 2002). In addition to the major element data presented in the original, the new database contains trace elements, isotopes, and time-series data, as well as nearly 100,000 additional samples that provide greater spatial coverage from both conventional and unconventional reservoir types, including geothermal. The database is a compilation of 40 individual databases, publications, or reports. The database was created in a manner to facilitate addition of new data and correct any compilation errors, and is expected to be updated over time with new data as provided and needed. Table 1, USGSPWDBv2.3 Data Sources.csv, shows the abbreviated ID of each input database (IDDB), the number of samples from each, and its reference. Table 2, USGSPWDBv2.3 Data Dictionary.csv, defines the 190 variables contained in the database and their descriptions. The database variables are organized first with identification and location information, followed by well descriptions, dates, rock properties, physical properties of the water, and then chemistry. The chemistry is organized alphabetically by elemental symbol. Each element is followed by any associated compounds (e.g. H2S is found after S). After Zr, molecules containing carbon, organic 9 compounds and dissolved gases follow. Isotopic data are found at the end of the dataset, just before the culling parameters.
Facebook
TwitterThe Daily Food & Nutrition Dataset provides a detailed record of everyday food consumption paired with essential nutritional values. It is designed to support data analysis, health monitoring, and machine-learning applications related to diet, wellness, and personalized nutrition.
This dataset captures a variety of food items along with their macronutrient and micronutrient composition, enabling users to explore dietary patterns, build predictive health models, and perform nutritional optimization. It is suitable for projects involving calorie tracking, nutrient recommendation systems, diet classification, or exploratory data analysis within the field of nutrition science.
Food Item & Category Identifies each food entry and its general classification (e.g., fruit, vegetable, grain, beverage, snack, etc.).
Nutritional Components Includes major nutrients that influence health and energy intake:
Meal Context The Meal_Type column specifies whether the food was consumed during breakfast, lunch, dinner, or as a snack — useful for temporal or behavioral pattern analysis.
Hydration Tracking Water_Intake (ml) allows hydration monitoring alongside nutritional consumption, enabling more holistic dietary assessments.
This dataset aims to serve health researchers, data scientists, nutritionists, and enthusiasts who want to analyze or model dietary behavior in a structured, meaningful way.
This dataset is not to be taken seriously. It has been synthetically generated to simulate real-world dietary records and reflects diverse food intake patterns through a randomized data generation process. It includes food categories, meal types, and nutritional values based on general nutritional guidelines and publicly available food databases.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Software Composition Analysis Market Size 2024-2028
The software composition analysis market size is forecast to increase by USD 871.7 million at a CAGR of 24.07% between 2023 and 2028.
The market is experiencing significant growth due to several key factors. Firstly, the increasing adoption of open source software (OSS) in enterprise applications has led to a greater need for SCA solutions to identify and manage the associated risks. Secondly, improved security and compliance standards, such as the European Union's General Data Protection Regulation (GDPR) and the Secure Configuration in Open Source Software (SC-OSS) project, have heightened the importance of SCA in ensuring the security and integrity of software components. Lastly, data security and cybersecurity concerns continue to be a major driver for SCA adoption, as organizations seek to mitigate risks associated with vulnerabilities in third-party libraries and dependencies.
What will be the Size of the Market During the Forecast Period?
Request Free Sample
The market is witnessing significant growth due to the increasing adoption of open-source software, IoT, and cloud-based services. SCA solutions help organizations identify and manage vulnerabilities in their software components, including those from the National Vulnerability Database, Universal Payments Interface, and others. SCA tools analyze source code, manifest files, binary files, container images, and Bill of Materials (BOMs) to identify known vulnerabilities in third-party libraries and dependencies. CSPs, such as Prisma Cloud, Flexera, WhiteSource, Diffend, and others, offer SCA solutions to help organizations secure their software supply chain. President Biden's recent executive order on improving the nation's cybersecurity focuses the importance of securing software supply chains.
SCA solutions can help organizations comply with this order by providing real-time visibility into their software components and vulnerabilities. SCA tools are essential for DevOps and DevSecOps teams, as they enable continuous integration and delivery while ensuring security. In the cloud-based software era, SCA solutions have become indispensable for securing software compositions in cloud environments. SCA solutions can be integrated with package managers and manifest files to provide real-time vulnerability scanning and remediation.
How is this market segmented and which is the largest segment?
The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Component
Solution
Services
Geography
North America
US
Europe
Germany
UK
APAC
China
Japan
Middle East and Africa
South America
By Component Insights
The solution segment is estimated to witness significant growth during the forecast period.
Software Composition Analysis (SCA) is a critical aspect of modern software development, particularly in the context of Open-source software, IoT, and Cloud-based services. SCA solutions help identify and manage risks associated with the use of third-party components, such as those found in the National Vulnerability Database, Universal Payments Interface, and Reserve Bank. SCA tools like Black Duck KnowledgeBase, Prisma Cloud, Flexera, WhiteSource, Diffend, and others, enable CSPs to ensure licensing compliance, improve code quality, and secure their DevOps and DevSecOps pipelines. These tools analyze manifest files, source code, binary files, and container images to identify vulnerabilities and generate alerts and reports.
Get a glance at the market report of share of various segments Request Free Sample
The solution segment was valued at USD 185.80 million in 2018 and showed a gradual increase during the forecast period.
Regional Analysis
North America is estimated to contribute 35% to the growth of the global market during the forecast period.
Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.
For more insights on the market share of various regions Request Free Sample
Software Composition Analysis (SCA) is a critical security practice that identifies and addresses vulnerabilities in open-source components used in applications. With the increasing adoption of IoT, cloud-based services, and Universal Payments Interface, the usage of open-source software has grown significantly. The National Vulnerability Database and financial institutions like the Reserve Bank have focused the importance of SCA for licensing compliance and code quality. Black Duck KnowledgeBase, Prisma Cloud, Flexera, WhiteSource, Diffend, and other leading providers offer SCA solutions. These tools help scan and analyze manifest fil
Facebook
Twitterhttps://www.bco-dmo.org/dataset/526852/licensehttps://www.bco-dmo.org/dataset/526852/license
The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. Other auxiliary metadata, such as physical, environmental and biometric information relating to the gelatinous zooplankton metadata, are included with each respective entry. JeDI has been developed and designed as an open access research tool for the scientific community to quantitatively define the global baseline of gelatinous zooplankton populations and to describe long-term and large-scale trends in gelatinous zooplankton populations and blooms. It has also been constructed as a future repository of datasets, thus allowing retrospective analyses of the baseline and trends in global gelatinous zooplankton populations to be conducted in the future. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=This information has been synthesized by members of the Global Jellyfish Group from online databases, unpublished and published datasets. More specific details may be found in\u00a0"%5C%22http://dmoserv3.bco-%0Admo.org/data_docs/JeDI/Lucas_et_al_2014_GEB.pdf%5C%22">Lucas, C.J., et al. 2014. Gelatinous zooplankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecol. Biogeogr. (DOI: 10.1111/geb.12169) in the\u00a0methods section. awards_0_award_nid=54810 awards_0_award_number=OCE-1030149 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1030149 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=JeDI: Jellyfish Database Initiative, associated with the Trophic BATS project PIs: R. Condon, C. Lucas, C. Duarte, K. Pitt version 2015.01.08 Note: The displayed view of this dataset is subject to updates Note: Duplicate records were removed on 2015.01.08 See: Dataset term legend for full text of abbreviations. Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/7191 Easternmost_Easting=180.0 geospatial_lat_max=88.74 geospatial_lat_min=-78.5 geospatial_lat_units=degrees_north geospatial_lon_max=180.0 geospatial_lon_min=-180.0 geospatial_lon_units=degrees_east geospatial_vertical_max=7632.0 geospatial_vertical_min=-10191.48 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/526852 institution=BCO-DMO metadata_source=https://www.bco-dmo.org/api/dataset/526852 Northernmost_Northing=88.74 param_mapping={'526852': {'lat': 'master - latitude', 'depth': 'master - depth', 'lon': 'master - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/526852/parameters people_0_affiliation=University of North Carolina - Wilmington people_0_affiliation_acronym=UNC-Wilmington people_0_person_name=Robert Condon people_0_person_nid=51335 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=University of Western Australia people_1_person_name=Carlos M. Duarte people_1_person_nid=526857 people_1_role=Co-Principal Investigator people_1_role_type=originator people_2_affiliation=National Oceanography Centre people_2_affiliation_acronym=NOC people_2_person_name=Cathy Lucas people_2_person_nid=526856 people_2_role=Co-Principal Investigator people_2_role_type=originator people_3_affiliation=Griffith University people_3_person_name=Kylie Pitt people_3_person_nid=526858 people_3_role=Co-Principal Investigator people_3_role_type=originator people_4_affiliation=Woods Hole Oceanographic Institution people_4_affiliation_acronym=WHOI BCO-DMO people_4_person_name=Danie Kinkade people_4_person_nid=51549 people_4_role=BCO-DMO Data Manager people_4_role_type=related project=Trophic BATS projects_0_acronym=Trophic BATS projects_0_description=Fluxes of particulate carbon from the surface ocean are greatly influenced by the size, taxonomic composition and trophic interactions of the resident planktonic community. Large and/or heavily-ballasted phytoplankton such as diatoms and coccolithophores are key contributors to carbon export due to their high sinking rates and direct routes of export through large zooplankton. The potential contributions of small, unballasted phytoplankton, through aggregation and/or trophic re-packaging, have been recognized more recently. This recognition comes as direct observations in the field show unexpected trends. In the Sargasso Sea, for example, shallow carbon export has increased in the last decade but the corresponding shift in phytoplankton community composition during this time has not been towards larger cells like diatoms. Instead, the abundance of the picoplanktonic cyanobacterium, Synechococccus, has increased significantly. The trophic pathways that link the increased abundance of Synechococcus to carbon export have not been characterized. These observations helped to frame the overarching research question, "How do plankton size, community composition and trophic interactions modify carbon export from the euphotic zone". Since small phytoplankton are responsible for the majority of primary production in oligotrophic subtropical gyres, the trophic interactions that include them must be characterized in order to achieve a mechanistic understanding of the function of the biological pump in the oligotrophic regions of the ocean. This requires a complete characterization of the major organisms and their rates of production and consumption. Accordingly, the research objectives are: 1) to characterize (qualitatively and quantitatively) trophic interactions between major plankton groups in the euphotic zone and rates of, and contributors to, carbon export and 2) to develop a constrained food web model, based on these data, that will allow us to better understand current and predict near-future patterns in export production in the Sargasso Sea. The investigators will use a combination of field-based process studies and food web modeling to quantify rates of carbon exchange between key components of the ecosystem at the Bermuda Atlantic Time-series Study (BATS) site. Measurements will include a novel DNA-based approach to characterizing and quantifying planktonic contributors to carbon export. The well-documented seasonal variability at BATS and the occurrence of mesoscale eddies will be used as a natural laboratory in which to study ecosystems of different structure. This study is unique in that it aims to characterize multiple food web interactions and carbon export simultaneously and over similar time and space scales. A key strength of the proposed research is also the tight connection and feedback between the data collection and modeling components. Characterizing the complex interactions between the biological community and export production is critical for predicting changes in phytoplankton species dominance, trophic relationships and export production that might occur under scenarios of climate-related changes in ocean circulation and mixing. The results from this research may also contribute to understanding of the biological mechanisms that drive current regional to basin scale variability in carbon export in oligotrophic gyres. projects_0_end_date=2014-09 projects_0_geolocation=Sargasso Sea, BATS site projects_0_name=Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea projects_0_project_nid=2150 projects_0_start_date=2010-10 sourceUrl=(local files) Southernmost_Northing=-78.5 standard_name_vocabulary=CF Standard Name Table v55 version=1 Westernmost_Easting=-180.0 xml_source=osprey2erddap.update_xml() v1.3
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Quantitative data on product chemical composition is a necessary parameter for characterizing near-field exposure. This data set comprises reported and predicted information on >75,000 chemicals contained in >15,000 consumer products. The data’s primary intended use is for exposure, risk, and safety assessments. The data set includes specific products with quantitative or qualitative ingredient information, which has been publicly disclosed through material safety data sheets (MSDS) and ingredient lists. A single product category from a refined and harmonized set of categories has been assigned to each product. The data set also contains information on the functional role of chemicals in products, which can inform predictions of the concentrations in which they occur. These data will be useful to exposure and risk assessors evaluating chemical and product safety.
The data set presented here is in the form of a MySQL relational database, which mimics CPDat data available under the ‘Exposure’ tab of the CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard) as of August 2017.
Facebook
TwitterEngineered nanomaterials (generally defined as being 1-100 nm is size) take on unique activities due to their small size, reactivity, and high surface area to mass ratio. While these properties can be highly desirable for the intended function of the materials and the products produced from them, they may also cause undesirable activity in environmental or biological systems. EPA program offices, faced with applications for novel engineered nanomaterials, need access to relevant data to help predict potential environmental/biological interactions of nanomaterials based on their physio-chemical properties and the intended uses of novel materials. We have developed a relational database containing the results from the Office of Research and Development (ORD) regarding the actions of engineered nanomaterials in environmental and biological systems. The database captures the chemical and physical parameters of the materials tested, the assays in which they were tested, and the measured results. The database is designed to enable selective searches of nanomaterials based on physical/chemical parameters such as composition, size, coatings, etc, the system in which they were tested, and the quantitative results obtained. For example, a user would be able to access information on materials of similar composition, size range, reactivity, or other properties measured, and see what results were observed. For example, users can implement observed results for modeling and prediction for similar nanomaterials using quantitative structure activity relationships and other sophisticated modeling approaches.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It is predicted that already in 2040, 35% of requirements for meat will be provided by in vitro production. Recreating the course of myogenesis in vitro, and thus resembling a structure of muscle tissue, is the basis for research focusing on obtaining cultured meat and requires providing relevant factors supporting the proliferation of satellite cells—being precursors of skeletal muscles. The present work aimed to develop the composition of the medium that would most effectively stimulate the proliferation of bovine satellite cells (BSCs). The modeling and optimization methods included the measurements of the synergistic, co-stimulatory effect of three medium components: the amount of glucose, the type of serum (bovine or horse), and the amount of mitogenic factor—bFGF. Additionally, the qPCR analyses determined the expression of genes involved in myogenesis, such as Pax7 and Myogenic Regulatory Factors, depending on the level of the tested factor. The results showed significant positive effects of serum type (bovine serum) and mitogenic factor (addition of 10 ng/mL bFGF) on the proliferation rate. In turn, qPCR analysis displayed no significant differences in the relative expression level of Pax7 genes and MRF factors for both factors. However, a statistically higher Pax7 and Myf5 gene expression level was revealed when a low glucose medium was used (p < 0.05). In conclusion, the components of the medium, such as bovine serum and the addition of a mitogenic factor at the level of 10 ng/mL, ensure a higher proliferation rate of BSCs and lower glucose content ensured the expression of crucial genes in the self-renewal of the satellite cell population.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction:
This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.
Motivation:
We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.
Results/Description:
MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.
Methods:
The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.
The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.
Software/databases used:
checkM v1.0.11[16]
HMMERv3.1b2 (http://hmmer.org/)
prodigal v2.6.3 [22]
trimAl v1.4.rev22 [24]
AliView v1.18.1 [33] [34]
Phyx v0.1 [35]
RAxML v8.2.12 [36]
Pplacer v1.1alpha [28]
GTDB-Tk v0.1.3 [19]
Kaiju v1.6.0 [34]
GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
NCBI Taxonomy (accessed 2018-07-02) [23]
TIGRFAM v14.0 [37]
PFAM v31.0 [38]
Discussion/Caveats:
MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high
Facebook
TwitterThe Coupling, Energetics, and Dynamics of Atmospheric Regions (CEDAR) at the National Center for Atmospheric Research/High Altitude Observatory (NCAR/HAO) incoherent-radar data base was established in 1985 by the institutions that operate the incoherent scatter radars in Jicamarca/Peru (12S,284E,DIP=1N), Arecibo/Puerto Rico (18N,293E,DIP=51N), St. Santin/France (45N,2E, DIP=61N), Millstone Hill/Mass. USA (43N,288E,DIP=73N), Chatanika/Alaska USA (65N,213E), and Eiscat/Scandinavia (70N,19E). The Chatanika radar was moved to Sondrestrom/Greenland (67N,309E) in 1982. A radar operating from Shigaraki, Japan began operating in 1986.
Incoherent scatter radars transmit very high power pulses at frequencies above 50 mhz. The scattering occurs at small-scale plasma fluctuations. The back-scattered power is proportional to the electron density in the scattering volume (in many cases an on-site ionosonde is used for calibration). the shape and Doppler broadening of the received spectrum allow determination of electron and ion temperature and ion composition and the shift against the transmitter frequency indicates the line-of-sight ion drift. Multi-receiver facilities like St. Santin and Eiscat allow measurements of all velocity vector components. In addition simple aeronomic theory together with a geomagnetic field model is often used to derive neutral wind, neutral temperature, atomic oxygen density, and electric field.
Below about 100 km and above about 800 km the ionospheric electron densities become so low that the signal-to-noise ratio is no longer acceptable for reliable data reduction.
Measurements are usually conducted during 2 to 3 days each month. The temporal and spatial resolution depends on the mode used: long integration times provide high sensitivity but low time resolution; large backscatter volumes provide good signal-to-noise ratio but poor altitude resolution. Typically the time resolution ranges from 1 to 30 minutes and the altitude resolution from a few to 100 km.
The Incoherent Scatter Radar (ISR) data base at CEDAR consists of the following:
Jicamarca ISR data from the Jicamarca Radio Observatory in Peru which has operated since 1963. The contact person is Wesley Swartz. Faraday rotation data is available from David Hysell. The Jicamarca Radio Observatory is operated by the Geophysical Institute of Peru, Ministry of Education, with support from the National Science Foundation through Cornell University.
Arecibo ISR data from the National Astronomy and Ionosphere Center in Arecibo, Puerto Rico. The radar has been in operation since
The contact person is Qihou Zhou. Arecibo Observatory is operated by Cornell University under the National Science Foundation.
ISR data from the Middle and Upper atmosphere (MU) radar from Shigaraki, Japan. The radar has been in operation since 1986. The contact person is Shoichiro Fukao. The MU radar belongs to the Radio Atomospheric Science Center of Kyoto Observatory.
ISR data from the fixed zenith antenna and the steerable antenna at Millstone Hill, Haystack Observatory. The radar has been in operation since 1960. The contact person is John Holt. The Millstone Hill ISR is supported by the National Science Foundation.
ISR data from the quarristatic system in France operated between 1963-1987. The contact person is Christine Amory-Mazaudier. The ISR is supported by the Institut d'Astronomie et de Geophysique and by the Direction des Recherches et Moyens d'Essais.
ISR data from Chatanika radar in Alaska was opeated by SRI International between 1971 and 1982. The radar was moved to Sonderstrom, Greenland. The contact person is John Kelly. The radar is supported by the NSF.
ISR data from the tristatic European Scatter System (EISCAT) in Scandanavia has been in operation since 1981. A system is located in Tromso, Norway and Kiruna, Sweden, and Sodankyla, Finland. The contact person is Peter Collis. EISCAT is supported by organizations in Finland, France, Germany, Japan, Norway, Sweden, and the UK.
ISR data from the Sondrestrom radar at Sondre Stromfjord, Greenland has been in operation since 1983. The radar was moved from Chatanika, Alaska by SRI International. The contact person is John Kelly. The radar is upported by the NSF.
The CEDAR Data Base is accessible through the WWW and ftp, but users must have a valid access form, available from the WWW or ftp (see Access and Use constraints) or contact Barbara Emery (emery@hao.ucar.edu). See the WWW site for additonal information on accessing the data and Rules of the Road procedures.
http://cedarweb.hao.ucar.edu/wiki/index.php/Data_Services:Rules_of_the_Road
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Predicting the final properties of new materials (composite materials). Composite material is a multicomponent material made from two or more components with significantly different physical and/or chemical properties that, when combined, result in a new material with characteristics that are different from those of the individual components and are not a simple superposition of them. It is customary to distinguish a matrix and fillers in the composition of a composite, the latter performing the function of reinforcement (by analogy with reinforcement in a composite building material such as reinforced concrete). The fillers of composites are usually carbon or glass fibers, and the role of the matrix is played by the polymer. The combination of different components improves the characteristics of the material and makes it both light and durable. At the same time, the individual components remain as such in the structure of the composites, which distinguishes them from mixtures and hardened solutions. By varying the composition of the matrix and filler, their ratio, and filler orientation, a wide range of materials with the required set of properties is obtained. Many composites are superior to traditional materials and alloys in their mechanical properties and at the same time they are lighter. The use of composites usually makes it possible to reduce the weight of a structure while maintaining or improving its mechanical characteristics. Modern composites are made from different materials: polymers, ceramics, glass and carbon fibers, but the basic principle remains the same. This approach also has a drawback: even if we know the characteristics of the original components, determining the characteristics of the composite consisting of these components is quite problematic. There are two ways to solve this problem: physical testing of material samples, and the second is predicting characteristics. The essence of forecasting is to simulate a representative element of the volume of the composite, based on data on the characteristics of the incoming components (binder and reinforcing component). Therefore, the relevance of the chosen topic is due to the fact that the created predictive models will help reduce the number of tests performed, as well as replenish the materials database with possible new characteristics of materials, and digital twins of new ones. In addition, an adequately functioning prediction model can significantly reduce the time, financial and other costs of testing. Therefore, it is necessary to develop models that predict tensile modulus and tensile strength, as well as a model that recommends the matrix-filler ratio.
The relevance lies in the fact that the created predictive models will help reduce the number of tests performed, as well as replenish the materials database with possible new characteristics of materials, as well as digital twins of new composites. Initial data on the properties of composite materials, presented in two data sets X_bp and X_nup.
The X_bp data set contains:
Matrix-filler ratio. Density, kg/m3. Modulus of elasticity, GPa. Amount of hardener, m.%. Content of epoxy groups,%_2). Flash point, C_2. Surface density, g/m2 Tensile modulus of elasticity, GPa Tensile strength, MPa Resin consumption, g/m2
The X_nup dataset contains:
Patch angle, degrees. Patch pitch. Patch density.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
GRACEnet (Greenhouse gas Reduction through Agricultural Carbon Enhancement network) is a research program initiated in the early 2000s . Goals are to better quantify greenhouse gas GHG emissions from cropped and grazed soils under current management practices and to identify and further develop improved management practices that will enhance carbon (C) sequestration in soils, decrease GHG emissions, promote sustainability and provide a sound scientific basis for carbon credits and GHG trading programs. This program generates information that is needed by agro-ecosystem modelers, producers, program managers and policy makers. Coordinated multi-location field studies follow standardized protocols to compare net GHG emissions (carbon dioxide, nitrous oxide, methane), C sequestration, crop/forge yields, and broad environmental benefits under different management systems that:
Typify existing production practices Maximize C sequestration Minimize net GHG emissions
Meet sustainable production and broad environmental benefit goals (including C sequestration, net GHG emissions, water, air and soil quality, etc.) Resources in this dataset:Resource Title: GRACEnet Brochure 2016. File Name: GRACENET brochure REVISED June 2017.pdfResource Title: Data Entry Template 2017. File Name: DET_GRACEnet_REAP.zipResource Description: Includes Excel templates for Experiment description worksheets, Site characterization worksheets, Management worksheets, Measurement worksheets where experimental unit data are reported, and Information that may be useful to the user, including drop down lists of treatment specific information and ranges of expected values. General and introductory instructions, as well as a Data Validation check are also included.Resource Title: GRACEnet Brochure 2017. File Name: GRACENET brochure REVISED July 2017 final.pdfResource Title: GRACEnet-NUOnet Data Dictionary. File Name: GRACEnet-NUOnet_DD.csvResource Title: GRACEnet Data Search. File Name: natres.zipResource Description: The attached file contains data from all sites as of February 9, 2022. For an interactive and up to date version of data visit https://usdaars.maps.arcgis.com/apps/MapSeries/index.html?appid=b66de747da394ed5aeab07dc9f50e516
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Spiders are dominant predators in terrestrial ecosystems and feed on prey from the herbivore and detritivore subsystem (dual subsystem omnivory) as well as on other predators (intraguild predation). Little is known about how global change potentially affects the importance of different prey groups in predator diets. In this meta-analysis we identify the impact of climatic conditions, land-use types and functional traits of spider species on the relative importance of Hemiptera, Araneae and Collembola prey in spider diets. We use a dataset including 78 publications with 149 observational records of the diet composition of 96 spider species in agricultural and non-agricultural habitats in 24 countries worldwide. The importance of Hemiptera prey was not affected by climatic conditions and was particularily high in smaller spider species in agricultural habitats. Araneae prey was most important for actively hunting, larger spider species in non-agricultural habitats. Collembola prey was most important for small, actively hunting spider species in regions with higher temperature seasonality. Spider species with a higher importance of Araneae prey for their diet also had higher importances of Collembola and lower importances of Hemiptera prey. Future increases of temperature seasonality predicted for several regions worldwide may go along with an increasing importance of Collembola prey which also related to a higher importance of intraguild prey here. Two global change drivers predicted for many regions of the world (increasing climatic seasonality and ongoing conversion of non-agricultural to agricultural land) both hold the potential to increase the importance of Collembola prey in spider diets. The importance of Hemiptera and Araneae prey may however show contrasting responses to these two drivers. These complex potential effects of global change components and their impact on functional traits in spider communities highlight the importance to simultaneously consider multiple drivers of global change to better understand future predator-prey interactions. Methods This study is based on a global database about the diet composition of hunting and web-building spider species in natural ecosystems used in Birkhofer and Wolters (2012) with the addition of data from agricultural ecsoystems and updates from Diehl et al. (2013b), Michalko & Pekár (2015a), Arvidsson et al. (2020) and Mezőfi et al. (2020). All data in the original publications are derived by direct visual records of prey or prey remains in spider species in field studies, not including data from molecular or experimental studies. Note that only subsets of data from the original database from non-agricultural (82 cursorial and web-building spider species in natural habitats: Birkhofer & Wolters 2012) or only for web-building spiders (63 spider species in agricultural, natural and forest habitats: Birkhofer et al. 2018) were previously published. The database includes 118 unique publications that reported 310 datasets about diet compositions in individual spider species worldwide. All datasets that were based on fewer than 20 recorded prey items per spider species in individual studies or did address spider species in forest habitats were excluded for this study. The selection of a minimum of 20 records was based on the fact that spiders in each study could theoretically reach the maximum diet breadth, including prey from all 20 prey orders that were originally recorded across datasets (see Birkhofer & Wolters 2012). The selection of non-forest habitats was based on the aim to compare habitats that share a major structural characteristic by not being dominated by dense, natural tree cover. Forests further have a very different invertebrate community compared to grasslands and arable fields (e.g. Birkhofer et al. 2015), which would limit a comparison of diets between major habitat types. The remaining 78 publications provided 149 datasets on the diet composition of 96 spider species worldwide (Figure 1). This database was used to calculate the relative contribution of each prey order to the overall diet in each dataset as percentage value. The percentages of Hemiptera, Collembola and Araneae prey were then extracted to reflect the relative contribution of these prey orders to the diet of individual spider species.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Effects of the diet composition on estimated diet metabolizable coefficient of phosphorous (MC-P) and calcium (MC-Ca) in lactating dairy cows.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Machine learning (ML) methods can train a model to predict material properties by exploiting patterns in materials databases that arise from structure–property relationships. However, the importance of ML-based feature analysis and selection is often neglected when creating such models. Such analysis and selection are especially important when dealing with multifidelity data because they afford a complex feature space. This work shows how a gradient-boosted statistical feature-selection workflow can be used to train predictive models that classify materials by their metallicity and predict their band gap against experimental measurements, as well as computational data that are derived from electronic-structure calculations. These models are fine-tuned via Bayesian optimization, using solely the features that are derived from chemical compositions of the materials data. We test these models against experimental, computational, and a combination of experimental and computational data. We find that the multifidelity modeling option can reduce the number of features required to train a model. The performance of our workflow is benchmarked against state-of-the-art algorithms, the results of which demonstrate that our approach is either comparable to or superior to them. The classification model realized an accuracy score of 0.943, a macro-averaged F1-score of 0.940, area under the curve of the receiver operating characteristic curve of 0.985, and an average precision of 0.977, while the regression model achieved a mean absolute error of 0.246, a root-mean squared error of 0.402, and R2 of 0.937. This illustrates the efficacy of our modeling approach and highlights the importance of thorough feature analysis and judicious selection over a “black-box” approach to feature engineering in ML-based modeling.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Qualitative food frequency questionnaires (Q-FFQ) omit portion size information from dietary assessment. This restricts researchers to consumption frequency data, limiting investigations of dietary composition (i.e., energy-adjusted intakes) and misreporting. To support such researchers, we provide an instructive example of Q-FFQ energy intake estimation that derives typical portion size information from a reference survey population and evaluates misreporting. A sample of 1,919 Childhood Determinants of Adult Health Study (CDAH) participants aged 26–36 years completed a 127-item Q-FFQ. We assumed sex-specific portion sizes for Q-FFQ items using 24-h dietary recall data from the 2011–2012 Australian National Nutrition and Physical Activity Survey (NNPAS) and compiled energy density values primarily using the Australian Food Composition Database. Total energy intake estimation was daily equivalent frequency × portion size (g) × energy density (kJ/g) for each Q-FFQ item, summed. We benchmarked energy intake estimates against a weighted sample of age-matched NNPAS respondents (n = 1,383). Median (interquartile range) energy intake was 9,400 (7,580–11,969) kJ/day in CDAH and 9,055 (6,916–11,825) kJ/day in weighted NNPAS. Median energy intake to basal metabolic rate ratios were 1.43 (1.15–1.78) in CDAH and 1.35 (1.03–1.74) in weighted NNPAS, indicating notable underreporting in both samples, with increased levels of underreporting among the overweight and obese. Using the Goldberg and predicted total energy expenditure methods for classifying misreporting, 65 and 41% of CDAH participants had acceptable/plausible energy intake estimates, respectively. Excluding suspected CDAH misreporters improved the plausibility of energy intake estimates, concordant with expected body weight associations. This process can assist researchers wanting an estimate of energy intake from a Q-FFQ and to evaluate misreporting, broadening the scope of diet–disease investigations that depend on consumption frequency data.