This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the final release of the 2020 CES Common Content Dataset. The data includes a nationally representative sample of 61,000 American adults. This release includes the data from the survey, a full guide to the data, and the questionnaires. The dataset includes vote validation performed by Catalist. Please consult the guide and the study website (https://cces.gov.harvard.edu/frequently-asked-questions) if you have questions about the study. Special thanks to Marissa Shih and Rebecca Phillips for their work in preparing this data for release.
The Harvard Forest is a collection of five properties, totaling about 1500 hectares, in Petersham, Massachusetts. Petersham is a rural town in Worcester County, Massachusetts, about 60 miles west of Boston. It is largely in the Swift River Watershed, and lies near the center of a twenty-mile wide band of hilly uplands that form the eastern edge of the Connecticut Valley. The north part of the town is rolling and the south more distinctly hilly; the lowest basins are about 200 m above sea level, the flats around 400m. Th e climate is cool temperate. Petersham, like many of the adjacent towns, was settled in the early 18th century, extensively cleared and farmed in the next hundred years, and then progressively abandoned after about 1830. Reforestation proceeded quickly, and by the time of the first Harvard Forest maps in 1909 HF was almost entirely wooded. Th e common forest types are dominated, variously, by red oak, red maple, white pine, or hemlock. Most are of low or average fertility and under 100 years old. Hemlock is now locally dominant in many stands that have been continuously forested; oaks, red maples and pines are the common dominants in stands that developed in old fields.
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The source dataset and its full description may be accessed through the Harvard Dataverse, and should be cited as
Tschandl, Philipp, 2018, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions", https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V4, UNF:6:KCZFcBLiFE5ObWcTc2ZBOA== [fileUNF]
Note that the herein uploaded dataset does not contain all of the source material, namely the file ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.tab
- which contains data on a study involving human-computer collaboration - and the folder HAM10000_segmentations_lesion_tschandl
- containing binary segmentation masks of the training images. Still, in contrast to most of the HAM10000 datasets published in Kaggle, the current one includes the test dataset that was curated for the ISIC 2018 challenge (Task 3).
The uploaded dataset is comprised by 3 folders and 2 files, described in the table below.
Content | Type | Description |
---|---|---|
HAM10000_images_part_1 | folder | Part 1 of a set of training pictures |
HAM10000_images_part_2 | folder | Part 2 of a set of training pictures |
ISIC2018_Task3_Test_Images | folder | Set of test pictures |
HAM10000_metadata.csv | file | Metadata associated with the training data |
ISIC2018_Task3_Test_GroundTruth.csv | file | Metadata associated with the test data |
The training dataset (HAM10000_images_part_1
and HAM10000_images_part_2
) is called "HAM10000" meaning "Human Against Machine with 10000 training images"" (actually 10015 images) and it corresponds to a large collection of multi-source dermatoscopic RGB images (JPG) of common pigmented skin lesions. The test dataset (ISIC2018_Task3_Test_Images
) corresponds to 511 images. The files HAM10000_metadata.csv
and ISIC2018_Task3_Test_GroundTruth.csv
contain the respective metadata (data about the data) which further include other features and the labels.
Their structure of the metadata files follows the template presented by the table below.
Column | Type | Description |
---|---|---|
lesion_id | String | ID of the lesion case |
image_id | String | ID of an image (also the name of the respective JPG file) associated with that case |
dx | String | Label of that case |
dx_type | String | Method used for diagnosing that case |
age | Float | Age of the person associated with that case |
sex | String | Sex of the person associated with that case |
localization | String | Location of the lesion in the person body |
dataset | String | Reference from which the data was taken |
dx
column (the classes)The values that the column dx
may take are tabulated below.
Value | Description |
---|---|
akiec | Actinic keratoses and intraepithelial carcinoma (also called "Bowen's disease") - an early form of skin cancer |
bcc | Basal cell carcinoma - the most common type of skin cancer |
bkl | Benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses) - common and benign |
df | Dermatofibroma - common and benign |
mel | Melanoma - a type of skin cancer involving the melanin cells |
nv | Melanocytic nevus - the medical term for a mole (benign) |
vasc | Vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage) (benign) |
dx_type
column (the diagnosis methods)And the table below present the values of the column dx_type
.
Value | Description |
---|---|
histo | Histopathology |
follow_up | Follow-up examination |
consensus | Expert consensus |
confocal | In-vivo confocal microscopy |
This replication archive contains all data and code to replicate the results in "A Common-Space Scaling of the American Judiciary and Legal Profession" by Maya Sen and Adam Bonica. Abstract: We extend the scaling methodology previously used in Bonica (2014) to jointly scale the American federal judiciary and legal profession in a common-space with other political actors. The end result is the first data set of consistently measured ideological scores across all tiers of the federal judiciary and the legal profession, including 840 federal judges and 380,307 attorneys. To illustrate these measures, we present two examples involving the U.S. Supreme Court. These data open up significant areas of scholarly inquiry.
We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives while recovering the words most associated with each senator's location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVE
More frequent data collection, especially when coupled with shorter recall periods, may produce more inclusive reporting, improved capture of intra-seasonal variability, and earlier signals of events that may merit policy or other forms of development intervention. Although there have been survey efforts that have collected a small number of data from rural households on the moderately high basis, to date there have been no significant efforts to collect a broad range of data from rural households with high frequency. The data included in this study was collected through the smartphone-based data collection technique that allowed participants to submit data at various frequencies and with various recall periods, thereby permitting the analysis of the relative merits of more frequent data streams. This study captured data from 480 farmers of northwestern Bangladesh over approximately one year of continuous data on key measures of household and community well-being that could be particularly useful for the design and evaluation of development interventions and policies. While the data discussed here provide a snapshot of what is possible, we also highlight their strength for providing opportunities for interdisciplinary research in the household agricultural production, practices, seasonal hunger, etc., in a low-income agrarian society.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZAhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZA
The Pan Africa Bean Research Alliance is a network of national agricultural research centers (NARS), and private and public sector institutions that work to deliver better beans with consumer and market preferred traits to farmers. The datasets presented here draw from 17 Sub Saharan countries that are members of PABRA. The dataset on released bean varieties is a collection of 513 bean varieties released by NARS and there characteristics. The dataset on bean varieties and the relationship to constraints provides the 513 bean varieties on the basis of resistance to constraints such as fungal, bacterial, viral, diseases and tolerance to abiotic stresses. There is also a dataset of bean varieties that have been released in more than one country, useful for moving seed from one country to another and facilitating regional trade. The dataset on Niche market traits provides the market defined classifications for bean trade in Sub Saharan Africa as well as varieties that fall into these classifications. The datasets are an update to the 2011 discussion on PABRAs achievement in breeding and delivery of bean varieties in Buruchara et. 2011 in pages 236 and 237 here: http://www.ajol.info/index.php/acsj/article/view/74168 . It is also an update to a follow up to this discussion in Muthoni, R. A., Andrade, R. 2015 on the performance of bean improvement programmes in sub-Saharan Africa from the perspectives of varietal output and adoption in chapter 8. here: http://dx.doi.org/10.1079/9781780644011.0148. The data is extracted from the PABRA M&E database available here (http://database.pabra-africa.org/?location=breeding).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the...
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are two pedigree based data set that was used to write a collaborative paper titled "Is Craniofacial Morphology and Body Composition Related by Common Genes: Comparative Analysis of Two Ethnically Diverse Populations"
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Foreign policy beliefs systems have received much attention. Yet nearly all work examines attitudes in western democracies, chiefly the United States. The current security environment, however, requires we ask whether the foreign policy views of individuals in other nations—particularly regional powers such as the BRICs—are similar in structure to those found in the U.S. case. This article does so for the Indian case. Drawing on studies of U.S. opinion, we develop a set of claims and test them on an original dataset on Indian elites. We make four contributions. First, we show that Wittkopf’s MICI framework applies to the Indian case. Second, we demonstrate how this framework can be made more generally applicable by revising its emphases on different types of internationalism and on rethinking the meaning of isolationist preferences. Third, we place the Indian case in comparative perspective. And lastly, we model the dimensions of Indian attitudes as a function of domestic ideology. Results of our analyses provide insights into the structure of foreign policy belief systems outside the Global North.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This version of the Shor-McCarty state legislative aggregate ideology data is being released as an update to the data underlying Shor and McCarty 2011. These are based on individual-level ideal point estimates described fully in that article. Estimates are all in Shor-McCarty NPAT common ideological space to facilitate explicit comparisons across time and between states. The data spans 1993 through 2018, with 2,268 chamber-years of data (compared with 2,025 in the previous release).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A common approach when studying the quality of representation involves comparing the latent preferences of voters and legislators, commonly obtained by fitting an item-response theory (IRT) model to a common set of stimuli. Despite being exposed to the same stimuli, voters and legislators may not share a common understanding of how these stimuli map onto their latent preferences, leading to differential item-functioning (DIF) and incomparability of estimates. We explore the presence of DIF and incomparability of latent preferences obtained through IRT models by re-analyzing an influential survey data set, where survey respondents expressed their preferences on roll call votes that U.S. legislators had previously voted on. To do so, we propose defining a Dirichlet Process prior over item-response functions in standard IRT models. In contrast to typical multi-step approaches to detecting DIF, our strategy allows researchers to fit a single model, automatically identifying incomparable sub-groups with different mappings from latent traits onto observed responses. We find that although there is a group of voters whose estimated positions can be safely compared to those of legislators, a sizeable share of surveyed voters understand stimuli in fundamentally different ways. Ignoring these issues can lead to incorrect conclusions about the quality of representation.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
LocoCraft Iron Armor is a popular video game that allows players to explore and build in a virtual world. One of the key features of the game is the ability to craft and wear armor to protect against enemy attacks. In this article, we will focus on the Iron Armor set in LocoCraft. Iron Armor is one of the most durable and protective armor sets in LocoCraft. It is made from Iron Ingots, which can be obtained by smelting Iron Ore in a furnace. To craft a full set of Iron Armor, you will need 24 Iron Ingots in total. The Iron Armor set consists of four pieces: the Iron Helmet, Iron Chestplate, Iron Leggings, and Iron Boots. Each piece provides varying levels of protection against enemy attacks. The Iron Helmet provides the least protection, while the Iron Chestplate provides the most. In addition to providing protection, the Iron Armor set also grants the player various bonuses.When wearing a full set of Iron Armor, the player will receive a 15% reduction in damage taken from enemy attacks. This makes the Iron Armor set ideal for players who want to explore dangerous areas or engage in combat with hostile mobs. To craft the Iron Armor set, you will need to arrange the Iron Ingots in a specific pattern on a crafting table. The pattern for each piece of armor is as follows: Iron Helmet: Place one Iron Ingot in each of the top three slots and one in the center slot. Iron Chestplate: Place two Iron Ingots in each of the top two rows and three in the bottom row. Iron Leggings: Place two Iron Ingots in each of the top two columns and one in the center column. Iron Boots: Place one Iron Ingot in each of the top two slots and one in the center slot of the bottom row. Once you have crafted all four pieces of Iron Armor, you can equip them by opening your inventory and placing them in the appropriate slots. You can also repair Iron Armor using additional Iron Ingots in an anvil. In conclusion, the Iron Armor set is a valuable asset for any LocoCraft player who wants to explore dangerous areas or engage in combat with hostile mobs. With its high durability and protective capabilities, the Iron Armor set is a must-have for any serious adventurer.
Common bean climate niche of Southeastern and Southern Africa Geospatial dataset of the climate niche for common bean in Southeastern Africa. Temperature and precipitation parameters collected from Beebe et al. (2011). Data sources: NASA MODIS Land Surface Temperature (MOD11A2) (NASA LP DAAC 2015; Wan et al. 2015) and CHIRPS Precipitation (Funk et al. 2015). Growing season months: November–April; temporal range: 2001–2017; precipitation range: 200–710 mm; temperature range: 13.6–25.6°C. Categories 0 - Non-agriculture 1 - Pessimal 2 - Unsuitable 3 - Marginal 4 - Suitable 5 - Optimal NASA MODIS Land Surface Temperature (LST) data NASA LP DAAC, 2015. MODIS Land Surface Temperature (MOD11A2) Version 005. NASA EOSDIS Land Processes DAAC, USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota. Wan, Z., Hook, S., Hulley, G. (2015). MOD11A2 MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid V006 [Data set]. NASA EOSDIS Land Processes DAAC. Accessed 2020-02-26 from https://doi.org/10.5067/MODIS/MOD11A2.006 CHIRPS precipitation data Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A. and Michaelsen, J., 2015. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific Data, 2, p.150066. Common bean temperature and precipitation parameters Beebe, S., Ramirez, J., Jarvis, A., Rao, I.M., Mosquera, G., Bueno, J.M. and Blair, M.W., 2011. Genetic improvement of common beans and the challenges of climate change. Crop Adaptation to Climate Change, 26, pp.356-369. Classification methodology Peter, B.G., Mungai, L.M., Messina, J.P. and Snapp, S.S., 2017. Nature-based agricultural solutions: Scaling perennial grains across Africa. Environmental Research, 159, pp.283-290. This content is made possible by the support of the American People provided to the Feed the Future Innovation Lab for Sustainable Intensification through the United States Agency for International Development (USAID). The contents are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. Program activities are funded by USAID under Cooperative Agreement No. AID-OAA-L-14-00006.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These files replicate all the results in Kosuke Imai, Gary King, and Carlos Velasco Rivera "Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments." To replicate all the analyses reported in the main manuscript and supplementary appendix, simply follow the next steps: 0. create a folder in your local computer (e.g., programmatic) 1. Download all the files to the directory created in step 0 2. Untar all the .tar files 3. Set the working directory to replicate 4. In the command line run 4.1 Rscript required-packages.R 4.2 Rscript replicate-sps.R 4.3 Rscript replicate-progresa.R 4.4 Rscript replicate-additional-tests.R Together, these scripts dump all the paper figures and tables in the figures and tables directories. For convenience, the figures directory has two sub-directories for the figures in the paper and in the supplementary appendix (main-figures and online-appendix). The names of all tables and figures follow the order in the paper.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This study concerns four Fado venues in Lisbon (three Fado houses and one theatre with fado show). 2653 TripAdvisor reviews (corresponding to 234,059 words) were collected and analyzed. We gathered all available reviews for each establishment at the time of data collection. The choice of Fado venues was determined by four criteria: i) location in the most touristic quarters of Lisbon (Alfama, Chiado and Bairro Alto), ii) prestige of the fado show, iii) scope of fado experience (in fado houses and theater) and iv) the classification on TripAdvisor, the platform where customer reviews were collected.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Concern about antisemitism in the U.S. has grown following recent rises in deadly assaults, vandalism, and harassment. Public accounts of antisemitism have focused on both the ideological right and left, suggesting a “horseshoe theory” in which the far left and the far right hold a common set of anti-Jewish prejudicial attitudes that dis¬tinguish them from the ideological center. However, there is little quantitative research evaluating left-wing versus right-wing antisemitism. We conduct several experiments on an original survey of 3,500 U.S. adults, including an oversample of young adults. We oversampled young adults because unlike other forms of prejudice that are more common among older people, antisemitism is theorized to be more common among younger people. Contrary to the expectation of horseshoe theory, the data show the epicenter of antisemitic attitudes is young adults on the far right.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.