The Survey of Doctorate Recipients (SDR) provides demographic, education, and career history information from individuals with a U.S. research doctoral degree in a science, engineering, or health (SEH) field. The SDR is sponsored by the National Center for Science and Engineering Statistics and by the National Institutes of Health. Conducted since 1973, the SDR is a unique source of information about the educational and occupational achievements and career movement of U.S.-trained doctoral scientists and engineers in the United States and abroad. This dataset includes SDR assets for 2019.
The Survey of Earned Doctorates (SED) is an annual census conducted since 1957 of all individuals receiving a research doctorate from an accredited U.S. institution in a given academic year. The SED is sponsored by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF) and by three other federal agencies: the National Institutes of Health, Department of Education, and National Endowment for the Humanities. The SED collects information on the doctoral recipient's educational history, demographic characteristics, and postgraduation plans. Results are used to assess characteristics of the doctoral population and trends in doctoral education and degrees. This dataset includes SED assets for 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OVERVIEW
This data file, compiled from multiple online sources, presents 2013–2017 publication counts—articles, articles in high-impact journals, books, and books from high-impact publishers—for 2,132 professors and associate professors in 426 U.S. departments of sociology. It also includes information on institutional characteristics (e.g., institution type, highest sociology degree offered, department size) and individual characteristics (e.g., academic rank, gender, PhD year, PhD institution).
The data may be useful for investigations of scholarly productivity, the correlates of scholarly productivity, and the contributions of particular individuals and institutions. Complete population data are presented for the top 26 doctoral programs, doctoral institutions other than R1 universities, the top liberal arts colleges, and other bachelor's institutions. Sample data are presented for Carnegie R1 universities (other than the top 26) and master's institutions.
USER NOTES
Please see our paper in Scholarly Assessment Reports, freely available at https://doi.org/10.29024/sar.36 , for full information about the data set and the methods used in its compilation. The section numbers used here refer to the Appendix of that paper. See the References, below, for other papers that have made use of these data.
The data file is a single Excel file with five worksheets: Sampling, Articles, Books, Individuals, and Departments. Each worksheet has a simple rectangular format, and the cells include just text and values—no formulas or links. A few general notes apply to all five worksheets.
• The yellow column headings represent institutional (departmental) data. The blue column headings represent data for individual faculty.
• iType is institution type, as described in section A.2—TopR (top research universities), R1 (other R1 universities), OD (other doctoral universities), M (master's institutions), TopLA (top liberal arts colleges), or B (other bachelor's institutions). nType provides the same information, but as a single-digit code that is more useful for sorting the rows; 1=TopR, 2=R1, 3=OD, 4=M, 5=TopLA, and 6=B.
• Inst is a four-digit institution code. The first digit corresponds to nType, and the last three digits allow for alphabetical sorting by institution name. Indiv is a one- or two-digit code that can be used to sort the individuals by name within each department. The Inst, nType, and Indiv codes are consistent across the five worksheets.
• For binary variables such as Full professor and Female, 1 indicates yes (full professor or female) and 0 indicates no (associate professor or male).
The five worksheets represent five distinct stages in the data compilation process. First, the Sampling worksheet lists the 1,530 base-population institutions (see section A.3) and presents the characteristics of the faculty included in the data file. Each row with an entry in the Individual column represents a faculty member at one of the 426 institutions included in the data set. Each row without an entry in the Individual column represents an institution that either (a) did not meet the criteria for inclusion (section A.1) or (b) was not needed to attain the desired sample size for the R1 or M groups (section A.3).
The Articles worksheet includes the data compiled from SocINDEX, as described in section A.6. Each row with an entry in the Journal column represents an article written by one of the 2,132 faculty included in the data. Each row without an entry in the Journal column represents a faculty member without any article listings in SocINDEX for the 2013–2017 period. (Note that SocINDEX items other than peer-reviewed articles—editorials, letters, etc.—may be listed in the Journal column but assigned a value of 1 in the Excluded column and a value of 0 in the Article credit and HI article credit columns. We assigned no credit for items such as editorial and letters, but other researchers may wish to include them.) The N and i columns represent, for each article, the number of authors (N) and the faculty member's place in the byline (i), as described in section A.8. The CiteScore and Highest percentile columns were used to identify high-impact journals, as indicated in the HI journal column. The Article credit and HI article credit columns are article counts, adjusted for co-authorship.
The Books worksheet includes data compiled from Amazon and other sources, as described in section A.7. Each row with an entry in the Book column represents a book written by one of the 2,132 faculty. Each row without an entry in the Book column represents a faculty member without any book listings in Amazon during the 2013–2017 period. The publication counts in the Books worksheet—Book credit and HI book credit—follow the same format as those in the Articles worksheet.
The Individuals worksheet consolidates information from the Articles and Books worksheets so that each of the 2,132 individuals is represented by a single row. The worksheet also includes several categorical variables calculated or otherwise derived from the raw data—Years since PhD, for instance, and the three corresponding binary variables. We suspect that many data users will be most interested in the Individuals worksheet.
The Departments worksheet collapses the individual data so that each of the 426 institutions (departments) is represented by a single row. Individual characteristics such as Female and Years since PhD are presented as percentages or averages—% Female and Avg years since PhD, for instance. Each of the four productivity measures is represented by a departmental total, an average (the total divided by the number of full and associate professors), a departmental standard deviation, and a departmental median.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Survey of Doctorate Recipients (SDR) provides demographic, education, and career history information from individuals with a U.S. research doctoral degree in a science, engineering, or health (SEH) field. The SDR is sponsored by the National Center for Science and Engineering Statistics and by the National Institutes of Health. Conducted since 1973, the SDR is a unique source of information about the educational and occupational achievements and career movement of U.S.-trained doctoral scientists and engineers in the United States and abroad. This dataset includes SDR assets for 2019.