The UMETRICS 2016Q3a Dataset is comprised of two collections. The first collection includes core files in which researchers will find university financial and personnel administrative data pertaining to sponsored project expenditures at IRIS member universities during a given year. UMETRICS core files are based on administrative data drawn directly from sponsored projects, procurement, and human resources data systems on each IRIS member university’s campus. Individual campus files are de-identified, cleaned and aggregated by IRIS to produce these core files. The core files include university data on sponsored project awards, direct cost wage payments from awards to employees, purchases of goods and services from vendors, and subaward transactions to subcontractors. Additional files provide supporting information to characterize and describe IRIS member institutions, identify sub-university units responsible for particular grants, and provide additional detail on object codes included by some data providers.
In addition to core files, we are releasing crosswalk files linking UMETRICS data to external datasets at the individual and award level. In the 2016Q3a release we include match tables that: (i) link individual UMETRICS research employees to dissertation data (with a focus on dissertation topics) provided by ProQuest, and (ii) link federal awards from the National Institutes of Health (NIH), National Science Foundation (NSF) and U.S. Department of Agriculture (USDA) to detailed information about the content of grants. This documentation includes details about the data as well as the matching process. The data release includes code and original data files to allow replication and improvement of matching procedures by research users.
This is a Jupyter notebook that explores the linked Survey of Earned Doctorates (SED)-Universities: Measuring the Impacts of Research on Innovation, Competitiveness, and Science (UMETRICS) data to get a better sense of how these two data sources might be used together. Furthermore, the purpose of this notebook is to allow participants to think critically about what exactly is being measured and how missingness in the data should be interpreted. This notebook was developed for the Fall 2021 Applied Data Analytics training facilitated by the National Center for Science and Engineering Statistics (NCSES) and Coleridge Initiative.
This Jupyter notebook introduces unsupervised machine learning through the lens of clustering. It demonstrates how k-means clustering can be employed to better understand the types of PhD students based on funding history by utilizing the linked Survey of Earned Doctorates (SED)-Universities: Measuring the Impacts of Research on Innovation, Competitiveness, and Science (UMETRICS) data. This supplemental notebook was developed for the Fall 2021 Applied Data Analytics training facilitated by the National Center for Science and Engineering Statistics (NCSES) and Coleridge Initiative.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The UMETRICS 2016Q3a Dataset is comprised of two collections. The first collection includes core files in which researchers will find university financial and personnel administrative data pertaining to sponsored project expenditures at IRIS member universities during a given year. UMETRICS core files are based on administrative data drawn directly from sponsored projects, procurement, and human resources data systems on each IRIS member university’s campus. Individual campus files are de-identified, cleaned and aggregated by IRIS to produce these core files. The core files include university data on sponsored project awards, direct cost wage payments from awards to employees, purchases of goods and services from vendors, and subaward transactions to subcontractors. Additional files provide supporting information to characterize and describe IRIS member institutions, identify sub-university units responsible for particular grants, and provide additional detail on object codes included by some data providers.
In addition to core files, we are releasing crosswalk files linking UMETRICS data to external datasets at the individual and award level. In the 2016Q3a release we include match tables that: (i) link individual UMETRICS research employees to dissertation data (with a focus on dissertation topics) provided by ProQuest, and (ii) link federal awards from the National Institutes of Health (NIH), National Science Foundation (NSF) and U.S. Department of Agriculture (USDA) to detailed information about the content of grants. This documentation includes details about the data as well as the matching process. The data release includes code and original data files to allow replication and improvement of matching procedures by research users.