Facebook
TwitterData to improve the visibility of demand and capacity across health services at all levels from system to regional to national.
Facebook
TwitterIntermediate wheatgrass [Thinopyrum intermedium (Host) Barkw. & D.R. Dewey subsp. intermedium] is a high-yielding cool-season grass with adaptable uses for grazing, haying, and soil restoration. Despite its adaptability, adoption of intermediate wheatgrass has been limited due to inadequate stand longevity under grazing stress. A study was conducted near Mandan, ND USA to investigate if stand longevity of intermediate wheatgrass was affected by changes in soil properties due to grazing. Soil data from this study included measurements of soil bulk density, soil pH, soil organic carbon, and total soil nitrogen on a Wilton silt loam soil (USDA: Fine-silty, mixed, superactive frigid Pachic Haplustoll). Measurements were made in May 1997 (baseline) and again in May 2004 following four years of grazing. Data may be used to understand soil property responses to grazed perennial forages. Data are generally applicable to rainfed conditions under a semiarid Continental climate for the following associated soil types: Temvik, Grassna, Linton, Mandan, and Williams. Resources in this dataset: Resource title: Intermediate Wheatgrass Grazing Study Data Dictionary File name: IWGS_Data Dictionary.xlsx Resource description: Data dictionary for associated dataset. Resource title: Intermediate Wheatgrass Grazing Study_Soil Data for Aggregated Depths File name: IWGS_Soil Data_Aggregated Depths.xlsx Resource description: File includes data for 0-30 cm depth. Resource title: Intermediate Wheatgrass Grazing Study_Soil Data for Separated Depths File name: IWGS_Soil Data_Separated Depths.xlsx Resource description: Soil data for 0-5, 5-10, 10-20, and 20-30 cm depths. Resource title: Intermediate Wheatgrass Grazing Study_Soil Data_Aggregated Depths File name: IWGS_Soil Data_Aggregated Depths.csv Resource description: Data for aggregated depths in csv format. Resource title: Intermediate Wheatgrass Grazing Study_Metadata_Aggregated Depths File name: IWGS_Soil Data_Aggregated Depths_Metadata.csv Resource description: Metadata for aggregated depths. Resource title: Intermediate Wheatgrass Grazing Study_Soils Data_Separated Depths File name: IWGS_Soil Data_Separated Depths.csv Resource description: Soil data for 0-5, 5-10, 10-20, and 20-30 cm depths. Resource title: Intermediate Wheatgrass Grazing Study_Metadata_Separated Depths File name: IWGS_Soil Data_Separated Depths_Metadata.csv Resource description: Metadata for soils data separated by depth increment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Serbia Imports: Intermediate Goods: Parts & Accessories: Transportation Means data was reported at 1.248 USD bn in 2016. This records a decrease from the previous number of 1.368 USD bn for 2015. Serbia Imports: Intermediate Goods: Parts & Accessories: Transportation Means data is updated yearly, averaging 328.000 USD mn from Dec 2001 (Median) to 2016, with 16 observations. The data reached an all-time high of 2.046 USD bn in 2013 and a record low of 81.000 USD mn in 2001. Serbia Imports: Intermediate Goods: Parts & Accessories: Transportation Means data remains active status in CEIC and is reported by Statistical Office of the Republic of Serbia. The data is categorized under Global Database’s Serbia – Table RS.JA014: Imports: by Economic Destination: Annual.
Facebook
TwitterSewer Network Junction Collection contains layers that function as fittings, allowing for devices or lines to connect to another line at an intermediate vertex. See the Sewer Data Dictionary for complete descriptions and definitions of each Layer, Asset Group, and Asset Type. The dictionary also provides details of the Utility Network along with attribute field definitions, relationship definitions, tables and attribute domain values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Serbia Exports: Intermediate Goods: Parts & Accessories: Transport Means data was reported at 1.827 USD bn in 2017. This records an increase from the previous number of 1.379 USD bn for 2016. Serbia Exports: Intermediate Goods: Parts & Accessories: Transport Means data is updated yearly, averaging 480.000 USD mn from Dec 2001 (Median) to 2017, with 17 observations. The data reached an all-time high of 1.827 USD bn in 2017 and a record low of 115.000 USD mn in 2001. Serbia Exports: Intermediate Goods: Parts & Accessories: Transport Means data remains active status in CEIC and is reported by Statistical Office of the Republic of Serbia. The data is categorized under Global Database’s Serbia – Table RS.JA006: Exports: by Economic Destination: Annual.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The release of the LCA Commons Unit Process Data: field crop production Version 1.1 includes the following updates:Added meta data to reflect USDA LCA Digital Commons data submission guidance including descriptions of the process (reference to which the size of the inputs and outputs in the process relate, description of the process and technical scope and any aggregation; definition of the technology being used, its operating conditions); temporal representatives; geographic representativeness; allocation methods; process type (U: unit process, S: system process); treatment of missing intermediate flow data; treatment of missing flow data to or from the environment; intermediate flow data sources; mass balance; data treatment (description of the methods and assumptions used to transform primary and secondary data into flow quantities through recalculating, reformatting, aggregation, or proxy data and a description of data quality according to LCADC convention); sampling procedures; and review details. Also, dataset documentation and related archival publications are cited in the APA format.Changed intermediate flow categories and subcategories to reflect the ISIC International Standard Industrial Classification (ISIC).Added “US-” to the US state abbreviations for intermediate flow locations.Corrected the ISIC code for “CUTOFF domestic barge transport; average fuel” (changed to ISIC 5022: Inland freight water transport).Corrected flow names as follows: "Propachlor" renamed "Atrazine". “Bromoxynil octanoate” renamed “Bromoxynil heptanoate”. “water; plant uptake; biogenic” renamed “water; from plant uptake; biogenic” half the instances of “Benzene, pentachloronitro-“ replaced with Etridiazole and half with “Quintozene”. “CUTOFF phosphatic fertilizer, superphos. grades 22% & under; at point-of-sale” replaced with “CUTOFF phosphatic fertilizer, superphos. grades 22% and under; at point-of-sale”.Corrected flow values for “water; from plant uptake; biogenic” and “dry matter except CNPK; from plant uptake; biogenic” in some datasets.Presented data in the International Reference Life Cycle Data System (ILCD)1 format, allowing the parameterization of raw data and mathematical relations to be presented within the datasets and the inclusion of parameter uncertainty data. Note that ILCD formatted data can be converted to the ecospold v1 format using the OpenLCA software.Data quality rankings have been updated to reflect the inclusion of uncertainty data in the ILCD formatted data.Changed all parameter names to “pxxxx” to accommodate mathematical relation character limitations in OpenLCA. Also adjusted select mathematical relations to recognize zero entries. The revised list of parameter names is provided in the documentation attached.Resources in this dataset:Resource Title: Cooper-crop-production-data-parameterization-version-1.1 .File Name: Cooper-crop-production-data-parameterization-version-1.1.xlsxResource Description: Description of parameters that define the Cooper Unit process data for field crop production version 1.1Resource Title: Cooper_Crop_Data_v1.1_ILCD.File Name: Cooper_Crop_Data_v1.1_ILCD.zipResource Description: .zip archive of ILCD xml files that comprise crop production unit process modelsResource Software Recommended: openLCA,url: http://www.openlca.org/Resource Title: Summary of Revisions of the LCA Digital Commons Unit Process Data: field crop production for version 1.1 (August 2013).File Name: Summary of Revisions of the LCA Digital Commons Unit Process Data- field crop production, Version 1.1 (August 2013).pdfResource Description: Documentation of revisions to version 1 data that constitute version 1.1
Facebook
TwitterNOTE: The manuscript associated with this data package is currently in review. The data may be revised based on reviewer feedback. Upon manuscript acceptance, this data package will be updated with the final dataset and additional metadata.This data package is associated with the manuscript “Artificial intelligence-guided iterations between observations and modeling significantly improve environmental predictions” (Malhotra et al., in prep). This effort was designed following ICON (integrated, coordinated, open, and networked) principles to facilitate a model-experiment (ModEx) iteration approach, leveraging crowdsourced sampling across the contiguous United States (CONUS). New machine learning models were created every month to guide sampling locations. Data from the resulting samples were used to test and rebuild the machine learning models for the next round of sampling guidance. Associated sediment and water geochemistry and in situ sensor data can be found at https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1923689, https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1729719, and https://data.ess-dive.lbl.gov/datasets/doi:10.15485/1603775. This data package is associated with two GitHub repositories found at https://github.com/parallelworks/dynamic-learning-rivers and https://github.com/WHONDRS-Hub/ICON-ModEx_Open_Manuscript. In addition to this readme, this data package also includes two file-level metadata (FLMD) files that describes each file and two data dictionaries (DD) that describe all column/row headers and variable definitions. This data package consists of two main folders (1) dynamic-learning-rivers and (2) ICON-ModEx_Open_Manuscript whichmore » contain snapshots of the associated GitHub repositories. The input data, output data, and machine learning models used to guide sampling locations are within dynamic-learning-rivers. The folder is organized into five top-level directories: (1) “input_data” holds the training data for the ML models; (2) “ml_models” holds machine learning (ML) models trained on the data in “input_data”; (3) “examples” contains files for direct experimentation with the machine learning model, including scripts for setting up “hindcast” run; (4) “scripts” contains data preprocessing and postprocessing scripts and intermediate results specific to this data set that bookend the ML workflow; and (5) “output_data” holds the overall results of the ML model on that branch. Each trained ML model resides on its own branch in the repository; this means that inputs and outputs can be different branch-to-branch. There is also one hidden directory “.github/workflows”. This hidden directory contains information for how to run the ML workflow as an end-to-end automated GitHub Action but it is not needed for reusing the ML models archived here. Please see the top-level README.md in the GitHub repository for more details on the automation.The scripts and data used to create figures in the manuscript are within ICON-ModEx_Open_Manuscript. The folder is organized into four folders which contain the scripts, data, and pdf for each figure. Within the “fig-model-score-evolution” folder, there is a folder called “intermediate_branch_data” which contains some intermediate files pulled from dynamic-learning-rivers and reorganized to easily integrate into the workflows. NOTE: THIS FOLDER INCLUDES THE FILES AT THE POINT OF PAPER SUBMISSION. IT WILL BE UPDATED ONCE THE PAPER IS ACCEPTED WITH ANY REVISIONS AND WILL INCLUDE A DD/FLMD AT THAT POINT.« less
Facebook
TwitterThe code includes several R markdown files and an associated file containing R functions. The main body of code coastal_TSI_ts.Rmd loads the function file located in the functions folder. Raw data is included as .csv files in the raw directory and various data sets created as intermediate products are in the data folder. Variable definitions are included in a data dictionary.
The code has been used to generate the analysis reported in
Hagy, JD, B. Kreakie, M. Pelletier, F. Nojavan, J. Kiddon, and A. Oczkowski. Quantifying coastal ecosystem condition and a trophic state index with a Bayesian analytical framework
This file contains a zip of a github repository, https://github.com/USEPA/-cTSI
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hyper-secretion and/or hyper-concentration of mucus is a defining feature of multiple obstructive lung diseases, including chronic obstructive pulmonary disease (COPD). Mucus itself is composed of a mixture of water, ions, salt and proteins, of which the gel-forming mucins, MUC5AC and MUC5B, are the most abundant. Recent studies have linked the concentrations of these proteins in sputum to COPD phenotypes, including chronic bronchitis (CB) and acute exacerbations (AE). We sought to determine whether common genetic variants influence sputum mucin concentrations and whether these variants are also associated with COPD phenotypes, specifically CB and AE. We performed a GWAS to identify quantitative trait loci for sputum mucin protein concentration (pQTL) in the Sub-Populations and InteRmediate Outcome Measures in COPD Study (SPIROMICS, n = 708 for total mucin, n = 215 for MUC5AC, MUC5B). Subsequently, we tested for associations of mucin pQTL with CB and AE using regression modeling (n = 822–1300). Replication analysis was conducted using data from COPDGene (n = 5740) and by examining results from the UK Biobank. We identified one genome-wide significant pQTL for MUC5AC (rs75401036) and two for MUC5B (rs140324259, rs10001928). The strongest association for MUC5B, with rs140324259 on chromosome 11, explained 14% of variation in sputum MUC5B. Despite being associated with lower MUC5B, the C allele of rs140324259 conferred increased risk of CB (odds ratio (OR) = 1.42; 95% confidence interval (CI): 1.10–1.80) as well as AE ascertained over three years of follow up (OR = 1.41; 95% CI: 1.02–1.94). Associations between rs140324259 and CB or AE did not replicate in COPDGene. However, in the UK Biobank, rs140324259 was associated with phenotypes that define CB, namely chronic mucus production and cough, again with the C allele conferring increased risk. We conclude that sputum MUC5AC and MUC5B concentrations are associated with common genetic variants, and the top locus for MUC5B may influence COPD phenotypes, in particular CB.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. Internally, the data is stored in so-called Resource Description Framework (RDF) triples of the form (subject, predicate, object).
Currently RDF data is stored and sent in very verbose textual serialisation formats that waste a lot of bandwidth and are expensive to parse and index. HDT (Header, Dictionary, Triples) is a compact data structure and binary serialization format for RDF that keeps big datasets compressed to save space while maintaining search and browse operations without prior decompression. This makes it an ideal format for storing and sharing RDF datasets on the Web. (source: what-is-hdt)
In 2008, Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.[3]
This dataset contains the HDT representation of the DBpedia 2016-10 dump Both the HDT and index file are provided to speed up search operations. A basic introduction on how to use and search within this dataset is provided in the following notebook.
This dataset would not be available by the ongoing research perform by the HDT team and the effors made by the DBpedia community. Also kudos to the IDLab research institute at Ghent University-imec, and especially the Knowledge Management team of prof. dr. Femke Ongenae.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents molecular properties critical for battery electrolyte design, specifically solvation energies, ionization potentials, and electron affinities. The dataset is intended for use in machine learning model testing and algorithm validation. The properties calculated include solvation energies using the COSMO-RS method [1] and ionization potentials and electron affinities using various high-accuracy computational methods as implemented in MOLPRO [2]. Computational details can be found in Ref. [3], with scripts used to generate the data mostly uploaded to our github repository [4].
Molecular Datasets Considered:
QM9 Dataset: Contains small organic molecules broadly relevant for quantum chemistry [5]
Electrolyte Genome Project (EGP): Focuses on materials relevant to electrolytes.[6]
GDB17 and ZINC databases: Offer a broad chemical diversity with potential application in battery technologies. [7, 8]
How to Load the Data:
All files can be loaded with
import json
with open("file.json", "r") as f: data_dict = json.load(f)
and the filestructure can be explored with
data_dict.keys()
We have also added an example script in python that shows how to extract all data from the JSON files following this link
Note the file structure of the the AMONS JSON files is slightly different as explained below!
The data is stored in two types of JSON archives: files for full molecules of GDB17 and ZINC and files for amons of GDB17 and ZINC. They are structured differently as amon entries are sorted by the number of heavy atoms in the amon (e.g., all amons with 3 heavy atoms are stored in ni3). Because of the large number of amons with 6 or 7 heavy atoms,they are further split into ni6_1, ni6_2, and so on. A sub dictionary of an amon dictionary or a full molecule dictionary contains the following keys:
ECFP - ECFP4 representation vector
SMILES - SMILES string
SYMBOLS - atomic symbols
COORDS - atomic positions in Angstrom
ATOMIZATION - atomization energy in [kcal/mol]
DIPOLE - dipole moment in Debye
ENERGY - energy in Hartree
SOLVATION - solvation energy in [kcal/mol] for different solvents at 300 K.
Files:
GDB17.json.zip (unpack with unzip first with unzip GDB17.json.zip) - subset of GDB17 random molecules
AMONS_ZINC.json - all amons of ZINC up to 7 heavy atoms
EGP.json - EGP molecules
AMONS_GDB17.json - all amons of GDB17 up to 7 heavy atoms
| File Name | Description | Molecules |
| AMONS_GDB17.json | GDB17 amons | 37860 |
| AMONS_ZINC.json | ZINC amons | 88771 |
| GDB17.json | Subset of GDB17 | 309468 |
| EGP.json | EGP molecules | 18362 |
Atomic energies $E_{at}$ at BP and def2-TZVPD level in Hartree [Ha]
| Element | H | C | N | O | F | Br | Cl | S | P |
| Eat [Ha] | -0.5 | -37.85 | -54.60 | -75.09 | -99.77 | -2574.40 | -460.20 | -398.16 | -341.30 |
| B | Si |
| -24.65 | -289.40 |
We follow the convention of negative atomization energies for stablity compared to the isolated atoms:
$E_{atomization} = E_{mol} - \sum_{i} E_{at,i}$
Free energy of solvation at 300 K in [kcal/mol]:
The upload contains two JSON files, QM9IPEA.json and QM9IPEA_atom_ens.json. QM9IPEA.json summarizes MOLPRO calculation data grouping it along the following dictionary keys:
COORDS - atom coordinates in Angstroms.
SYMBOLS - atom element symbols.
ENERGY - total energies for each charge (0, -1, 1) and method considered.
CPU_TIME - CPU times (in seconds) spent at each step of each part of the calculation.
DISK_USAGE - highest total disk usage in GB.
ATOMIZATION_ENERGY - atomization energy at charge 0.
QM9_ID - ID of the molecule in the QM9 dataset.
All energies are given in Hartrees with NaN indicating the calculation failed to converge. Ionization potentials and electron affinities can be recovered as energy differences between neutral and charged (+1 for ionization potentials, -1 for electron affinities) species.
"CPU_time" entries contain steps corresponding to individual method calculations, as well as steps corresponding to program operation: "INT" (calculating integrals over basis functions relevant for the calculation), "FILE" (dumping intermediate data to restart file), and "RESTART" (importing restart data). The latter two steps appeared since we reused relevant integrals calculated for neutral species in charged species' calculations; we also used restart functionality to use HF density matrix obtained for the neutral species as the initial density matrix guess for the SCF-HF calculation for charged species. NaN CPU time value means the step was not present or that the calculation is invalid. Note that the CPU times were measured while parallelizing on 12 cores and were not adjusted to single-core.
QM9IPEA_atom_ens.json contains atomic energies used to calculate atomization energies in QM9IPEA.json, the dictionary keys are:
SPINS - the spin assigned to elements during calculations of atomic energies.
ENERGY - energies of atoms using different methods.
(Note that H has only one electron and thus does not require a level of theory beyond Hartree-Fock.)
NOTE: Additional calculations were performed between publication of arXiv:2308.11196 and creation of this upload. For the version of the dataset used in the manuscript, please refer to DOI:10.5281/zenodo.8252498.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957189 (BIG-MAP) and No. 957213 (BATTERY 2030+). O.A.v.L. has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772834). O.A.v.L. has received support as the Ed Clark Chair of Advanced Materials and as a Canada CIFAR AI Chair. O.A.v.L. acknowledges that this research is part of the University of Toronto’s Acceleration Consortium, which receives funding from the Canada First Research Excellence Fund (CFREF). Obtaining the presented computational results has been facilitated using the queueing system implemented at https://leruli.com. The project has been supported by the Swedish Research Council (Vetenskapsrådet), and the Swedish National Strategic e-Science program eSSENCE as well as by computing resources from the Swedish National Infrastructure for Computing (SNIC/NAISS).
[1] Klamt, A.; Eckert, F. COSMO-RS: a novel and efficient method for the a priori prediction of thermophysical data of liquids. Fluid Phase Equilibria 2000, 172, 43–72
[2] Werner, H.-J.; Knowles, P. J.; Knizia, G.; Manby, F. R.; Schutz, M. Molpro: a general-purpose quantum chemistry program package. WIREs Comput. Mol. Sci. 2012, 2, 242–253
[3] arxiv link of draft
[4] https://github.com/chemspacelab/ViennaUppDa
[5] Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 2014, 1, 140022
[6] Qu, X.; Jain, A.; Rajput, N. N.; Cheng, L.; Zhang, Y.; Ong, S. P.; Brafman, M.; Mag- inn, E.; Curtiss, L. A.; Persson, K. A. The Electrolyte Genome Project: A big data approach in battery materials discovery. Comput. Mater. Sci. 2015, 103, 56–67
[7] Ruddigkeit, L.; van Deursen, R.; Blum, L. C.; Reymond, J.-L. Enu- meration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17. Journal of Chemical Information and Modeling 2012, 52, 2864–2875
[8] Irwin, J. J.; Shoichet, B. K. ZINC A Free Database of Commercially Available Compounds for Virtual Screening. Journal of Chemical Information and Modeling 2005, 45, 177–182.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Total Business Enterprise R&D Personnel: Compound Annual Growth Rate data was reported at 11.961 % in 2022. This records an increase from the previous number of 9.937 % for 2021. China Total Business Enterprise R&D Personnel: Compound Annual Growth Rate data is updated yearly, averaging 10.675 % from Dec 1992 (Median) to 2022, with 29 observations. The data reached an all-time high of 26.734 % in 2005 and a record low of -14.226 % in 1998. China Total Business Enterprise R&D Personnel: Compound Annual Growth Rate data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual.
The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.
The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.
From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.
In 2009, the survey coverage in the business and the government sectors has been expanded.
Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.
Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.
Facebook
TwitterThis data package is associated with the publication “Prediction of Distributed River Sediment Respiration Rates using Community-Generated Data and Machine Learning’’ submitted to the Journal of Geophysical Research: Machine Learning and Computation (Scheibe et al. 2024). River sediment respiration observations are expensive and labor intensive to obtain and there is no physical model for predicting this quantity. The Worldwide Hydrobiogeochemisty Observation Network for Dynamic River Systems (WHONDRS) observational data set (Goldman et al.; 2020) is used to train machine learning (ML) models to predict respiration rates at unsampled sites. This repository archives training data, ML models, predictions, and model evaluation results for the purposes of reproducibility of the results in the associated manuscript and community reuse of the ML models trained in this project. One of the key challenges in this work was to find an optimum configuration for machine learning models to work with this feature-rich (i.e. 100+ possible input variables) data set. Here, we used a two-tiered approach to managing the analysis of this complex data set: 1) a stacked ensemble of ML models that can automatically optimize hyperparameters to accelerate the process of model selection and tuning and 2) feature permutation importance to iteratively select the most important features (i.e. inputs) to the ML models. The major elements of this ML workflow are modular, portable, open, and cloud-based, thus making this implementation a potential template for other applications. This data package is associated with the GitHub repository found at https://github.com/parallelworks/sl-archive-whondrs. A static copy of the GitHub repository is included in this data package as an archived version at the time of publishing this data package (March 2023). However, we recommend accessing these files via GitHub for full functionality.Please see the file level metadata (flmd; “sl-archive-whondrs_flmd.csv”) for a list of all files contained in this data package and descriptions for each. Please see the data dictionary (dd; “sl-archive-whondrs_dd.csv”) for a list of all column headers contained within comma separated value (csv) files in this data package and descriptions for each. The GitHub repository is organized into five top-level directories: (1) “input_data” holds the training data for the ML models; (2) “ml_models” holds machine learning models trained on the data in “input_data”; (3) “scripts” contains data preprocessing and postprocessing scripts and intermediate results specific to this data set that bookend the ML workflow; (4) “examples” contains the visualization of the results in this repository including plotting scripts for the manuscript (e.g., model evaluation, FPI results) and scripts for running predictions with the ML models (i.e., reusing the trained ML models); (5) “output_data” holds the overall results of the ML model on that branch. Each trained ML model resides on its own branch in the repository; this means that inputs and outputs can be different branch-to-branch. Furthermore, depending on the number of features used to train the ML models, the preprocessing and postprocessing scripts, and their intermediate results, can also be different branch-to-branch. The “main-*” branches are meant to be starting points (i.e. trunks) for each model branch (i.e. sprouts). Please see the Branch Navigation section in the top-level README.md in the GitHub repository for more details. There is also one hidden directory “.github/workflows”. This hidden directory contains information for how to run the ML workflow as an end-to-end automated GitHub Action but it is not needed for reusing the ML models archived here. Please the top-level README.md in the GitHub repository for more details on the automation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Business Enterprise Researchers: Compound Annual Growth Rate data was reported at 10.596 % in 2022. This records an increase from the previous number of 4.318 % for 2021. China Business Enterprise Researchers: Compound Annual Growth Rate data is updated yearly, averaging 9.287 % from Dec 1992 (Median) to 2022, with 29 observations. The data reached an all-time high of 31.562 % in 2005 and a record low of -33.895 % in 1998. China Business Enterprise Researchers: Compound Annual Growth Rate data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual.
The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.
The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.
From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.
In 2009, the survey coverage in the business and the government sectors has been expanded.
Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.
Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China Number of Researchers: Total data was reported at 2,069,650.000 Person in 2012. This records an increase from the previous number of 1,905,899.000 Person for 2011. China Number of Researchers: Total data is updated yearly, averaging 1,905,899.000 Person from Dec 2010 (Median) to 2012, with 3 observations. The data reached an all-time high of 2,069,650.000 Person in 2012 and a record low of 1,747,589.000 Person in 2010. China Number of Researchers: Total data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Number of Researchers and Personnel on Research and Development: Non OECD Member: Annual.
The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.
The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.
From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.
In 2009, the survey coverage in the business and the government sectors has been expanded.
Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.
Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.
Facebook
TwitterThe Forest Service Basemap service is created, maintained, and produced by the U.S. Forest Service. The Forest Service Basemap is a scalable digital map product and can be used as background (or basemap) in web applications and GIS software. The Forest Service Basemap is compiled from authoritative data sources from the US Forest Service, the US Geologic Survey (USGS), the Bureau of Land Management (BLM), the National Park Service (NPS), the US Fish and Wildlife Service (USFWS), The Census Bureau (US Census), The Federal Aviation Administration (FAA), North American Rail Network (NARN), and the Homeland Infrastructure Foundation Level Data (HIFLD- HERE) from the Department of Homeland Security (DHS).
Latest Update: A series of updates were implemented to improve data accuracy and map clarity. The Trails Plus GTAC definition query was revised, and Wilderness designations nomenclature standardized to 'Wild.' across USFS and PADUS layers. Boundary corrections were made for Mount St. Helens NVM and Mt. Adams RD. Labels were added for State and USFWS lands, and land ownership data can now be viewed at maximum zoom level. A trail adjustment was completed at Simpson Peak in Questa RD. Additional updates include the Denali–Mount McKinley name correction, location updates for Juno and Pittsburg, and display boundary corrections for urban areas. Intermediate hydro layers were removed to streamline data, and queries were applied to NHD datasets to generalize hydrographic features. In Alaska, native lands were symbolized to distinguish tribal lands from reservation lands. A region-specific road query was applied to filter roads data relevant to Alaska and wetlands data for the state was updated to only show Palustrine Emergent Wetlands.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
China GERD Performed: Higher Education Sector data was reported at 7.837 % in 2022. This records an increase from the previous number of 7.800 % for 2021. China GERD Performed: Higher Education Sector data is updated yearly, averaging 8.525 % from Dec 1991 (Median) to 2022, with 32 observations. The data reached an all-time high of 12.633 % in 1994 and a record low of 6.840 % in 2016. China GERD Performed: Higher Education Sector data remains active status in CEIC and is reported by Organisation for Economic Co-operation and Development. The data is categorized under Global Database’s China – Table CN.OECD.MSTI: Gross Domestic Expenditure on Research and Development: Non OECD Member: Annual.
The national breakdown by source of funds does not fully match with the classification defined in the Frascati Manual. The R&D financed by the government, business enterprises, and by the rest of the world can be retrieved but part of the expenditure has no specific source of financing, i.e. self-raised funding (in particular for independent research institutions), the funds from the higher education sector and left-over government grants from previous years.
The government and higher education sectors cover all fields of NSE and SSH while the business enterprise sector only covers the fields of NSE. There are only few organisations in the private non-profit sector, hence no R&D survey has been carried out in this sector and the data are not available.
From 2009, researcher data are collected according to the Frascati Manual definition of researcher. Beforehand, this was only the case for independent research institutions, while for the other sectors data were collected according to the UNESCO concept of “scientist and engineer”.
In 2009, the survey coverage in the business and the government sectors has been expanded.
Before 2000, all of the personnel data and 95% of the expenditure data in the business enterprise sector are for large and medium-sized enterprises only. Since 2000 however, the survey covers almost all industries and all enterprises above a certain threshold. In 2000 and 2004, a census of all enterprises was held, while in the intermediate years data for small enterprises are estimated.
Due to the reform of the S&T system some government institutions have become enterprises, and their R&D data have been reflected in the Business Enterprise sector since 2000.
Facebook
TwitterThe Forest Service Basemap service is created, maintained, and produced by the U.S. Forest Service. The Forest Service Basemap is a scalable digital map product and can be used as background (or basemap) in web applications and GIS software. The Forest Service Basemap is compiled from authoritative data sources from the US Forest Service, the US Geologic Survey (USGS), the Bureau of Land Management (BLM), the National Park Service (NPS), the US Fish and Wildlife Service (USFWS), The Census Bureau (US Census), The Federal Aviation Administration (FAA), North American Rail Network (NARN), and the Homeland Infrastructure Foundation Level Data (HIFLD- HERE) from the Department of Homeland Security (DHS).
Latest Update: A series of updates were implemented to improve data accuracy and map clarity. The Trails Plus GTAC definition query was revised, and Wilderness designations nomenclature standardized to 'Wild.' across USFS and PADUS layers. Boundary corrections were made for Mount St. Helens NVM and Mt. Adams RD. Labels were added for State and USFWS lands, and land ownership data can now be viewed at maximum zoom level. A trail adjustment was completed at Simpson Peak in Questa RD. Additional updates include the Denali–Mount McKinley name correction, location updates for Juno and Pittsburg, and display boundary corrections for urban areas. Intermediate hydro layers were removed to streamline data, and queries were applied to NHD datasets to generalize hydrographic features. In Alaska, native lands were symbolized to distinguish tribal lands from reservation lands. A region-specific road query was applied to filter roads data relevant to Alaska and wetlands data for the state was updated to only show Palustrine Emergent Wetlands.
Facebook
TwitterThe Forest Service Basemap service is created, maintained, and produced by the U.S. Forest Service. The Forest Service Basemap is a scalable digital map product and can be used as background (or basemap) in web applications and GIS software. The Forest Service Basemap is compiled from authoritative data sources from the US Forest Service, the US Geologic Survey (USGS), the Bureau of Land Management (BLM), the National Park Service (NPS), the US Fish and Wildlife Service (USFWS), The Census Bureau (US Census), The Federal Aviation Administration (FAA), North American Rail Network (NARN), and the Homeland Infrastructure Foundation Level Data (HIFLD- HERE) from the Department of Homeland Security (DHS).
Latest Update: A series of updates were implemented to improve data accuracy and map clarity. The Trails Plus GTAC definition query was revised, and Wilderness designations nomenclature standardized to 'Wild.' across USFS and PADUS layers. Boundary corrections were made for Mount St. Helens NVM and Mt. Adams RD. Labels were added for State and USFWS lands, and land ownership data can now be viewed at maximum zoom level. A trail adjustment was completed at Simpson Peak in Questa RD. Additional updates include the Denali–Mount McKinley name correction, location updates for Juno and Pittsburg, and display boundary corrections for urban areas. Intermediate hydro layers were removed to streamline data, and queries were applied to NHD datasets to generalize hydrographic features. In Alaska, native lands were symbolized to distinguish tribal lands from reservation lands. A region-specific road query was applied to filter roads data relevant to Alaska and wetlands data for the state was updated to only show Palustrine Emergent Wetlands.
Facebook
TwitterThe Forest Service Basemap service is created, maintained, and produced by the U.S. Forest Service. The Forest Service Basemap is a scalable digital map product and can be used as background (or basemap) in web applications and GIS software. The Forest Service Basemap is compiled from authoritative data sources from the US Forest Service, the US Geologic Survey (USGS), the Bureau of Land Management (BLM), the National Park Service (NPS), the US Fish and Wildlife Service (USFWS), The Census Bureau (US Census), The Federal Aviation Administration (FAA), North American Rail Network (NARN), and the Homeland Infrastructure Foundation Level Data (HIFLD- HERE) from the Department of Homeland Security (DHS).
Latest Update: A series of updates were implemented to improve data accuracy and map clarity. The Trails Plus GTAC definition query was revised, and Wilderness designations nomenclature standardized to 'Wild.' across USFS and PADUS layers. Boundary corrections were made for Mount St. Helens NVM and Mt. Adams RD. Labels were added for State and USFWS lands, and land ownership data can now be viewed at maximum zoom level. A trail adjustment was completed at Simpson Peak in Questa RD. Additional updates include the Denali–Mount McKinley name correction, location updates for Juno and Pittsburg, and display boundary corrections for urban areas. Intermediate hydro layers were removed to streamline data, and queries were applied to NHD datasets to generalize hydrographic features. In Alaska, native lands were symbolized to distinguish tribal lands from reservation lands. A region-specific road query was applied to filter roads data relevant to Alaska and wetlands data for the state was updated to only show Palustrine Emergent Wetlands.
Facebook
TwitterData to improve the visibility of demand and capacity across health services at all levels from system to regional to national.