19 datasets found
  1. d

    Harvard Common Data Set

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Institutional Research (2023). Harvard Common Data Set [Dataset]. http://doi.org/10.7910/DVN/AOD2ZV
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Office of Institutional Research
    Description

    This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.

  2. H

    Cooperative Election Study Common Content, 2020

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Schaffner; Stephen Ansolabehere; Sam Luks (2022). Cooperative Election Study Common Content, 2020 [Dataset]. http://doi.org/10.7910/DVN/E9N6PH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Brian Schaffner; Stephen Ansolabehere; Sam Luks
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is the final release of the 2020 CES Common Content Dataset. The data includes a nationally representative sample of 61,000 American adults. This release includes the data from the survey, a full guide to the data, and the questionnaires. The dataset includes vote validation performed by Catalist. Please consult the guide and the study website (https://cces.gov.harvard.edu/frequently-asked-questions) if you have questions about the study. Special thanks to Marissa Shih and Rebecca Phillips for their work in preparing this data for release.

  3. e

    Harvard Forest - United States of America - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Aug 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Harvard Forest - United States of America - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2eafdd4a-1d25-548c-9586-9d99f55ef6e9
    Explore at:
    Dataset updated
    Aug 10, 2016
    Area covered
    United States
    Description

    The Harvard Forest is a collection of five properties, totaling about 1500 hectares, in Petersham, Massachusetts. Petersham is a rural town in Worcester County, Massachusetts, about 60 miles west of Boston. It is largely in the Swift River Watershed, and lies near the center of a twenty-mile wide band of hilly uplands that form the eastern edge of the Connecticut Valley. The north part of the town is rolling and the south more distinctly hilly; the lowest basins are about 200 m above sea level, the flats around 400m. Th e climate is cool temperate. Petersham, like many of the adjacent towns, was settled in the early 18th century, extensively cleared and farmed in the next hundred years, and then progressively abandoned after about 1830. Reforestation proceeded quickly, and by the time of the first Harvard Forest maps in 1909 HF was almost entirely wooded. Th e common forest types are dominated, variously, by red oak, red maple, white pine, or hemlock. Most are of low or average fertility and under 100 years old. Hemlock is now locally dominant in many stands that have been continuously forested; oaks, red maples and pines are the common dominants in stands that developed in old fields.

  4. Replication data for: Logistic Regression in Rare Events Data

    • search.datacite.org
    Updated 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary King (2010). Replication data for: Logistic Regression in Rare Events Data [Dataset]. http://doi.org/10.7910/dvn/spafjk
    Explore at:
    Dataset updated
    2010
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Harvard Dataverse
    Authors
    Gary King
    Description

    We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

  5. Skin Cancer - The HAM10000 dataset

    • kaggle.com
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Élio Cordeiro Pereira (2024). Skin Cancer - The HAM10000 dataset [Dataset]. https://www.kaggle.com/datasets/eliocordeiropereira/skin-cancer-the-ham10000-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Élio Cordeiro Pereira
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The Original Dataset

    The source dataset and its full description may be accessed through the Harvard Dataverse, and should be cited as

    Tschandl, Philipp, 2018, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions", https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V4, UNF:6:KCZFcBLiFE5ObWcTc2ZBOA== [fileUNF]

    The Current Dataset

    Note that the herein uploaded dataset does not contain all of the source material, namely the file ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.tab - which contains data on a study involving human-computer collaboration - and the folder HAM10000_segmentations_lesion_tschandl - containing binary segmentation masks of the training images. Still, in contrast to most of the HAM10000 datasets published in Kaggle, the current one includes the test dataset that was curated for the ISIC 2018 challenge (Task 3).

    Description

    Files and folders

    The uploaded dataset is comprised by 3 folders and 2 files, described in the table below.

    ContentTypeDescription
    HAM10000_images_part_1folderPart 1 of a set of training pictures
    HAM10000_images_part_2folderPart 2 of a set of training pictures
    ISIC2018_Task3_Test_ImagesfolderSet of test pictures
    HAM10000_metadata.csvfileMetadata associated with the training data
    ISIC2018_Task3_Test_GroundTruth.csvfileMetadata associated with the test data



    The training dataset (HAM10000_images_part_1 and HAM10000_images_part_2) is called "HAM10000" meaning "Human Against Machine with 10000 training images"" (actually 10015 images) and it corresponds to a large collection of multi-source dermatoscopic RGB images (JPG) of common pigmented skin lesions. The test dataset (ISIC2018_Task3_Test_Images) corresponds to 511 images. The files HAM10000_metadata.csv and ISIC2018_Task3_Test_GroundTruth.csv contain the respective metadata (data about the data) which further include other features and the labels.

    Columns of the metadata files

    Their structure of the metadata files follows the template presented by the table below.

    ColumnTypeDescription
    lesion_idStringID of the lesion case
    image_idStringID of an image (also the name of the respective JPG file) associated with that case
    dxStringLabel of that case
    dx_typeStringMethod used for diagnosing that case
    ageFloatAge of the person associated with that case
    sexStringSex of the person associated with that case
    localizationStringLocation of the lesion in the person body
    datasetStringReference from which the data was taken



    Values of the metadata dx column (the classes)

    The values that the column dx may take are tabulated below.

    ValueDescription
    akiecActinic keratoses and intraepithelial carcinoma (also called "Bowen's disease") - an early form of skin cancer
    bccBasal cell carcinoma - the most common type of skin cancer
    bklBenign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses) - common and benign
    dfDermatofibroma - common and benign
    melMelanoma - a type of skin cancer involving the melanin cells
    nvMelanocytic nevus - the medical term for a mole (benign)
    vascVascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage) (benign)



    Values of the metadata dx_type column (the diagnosis methods)

    And the table below present the values of the column dx_type.

    ValueDescription
    histoHistopathology
    follow_upFollow-up examination
    consensusExpert consensus
    confocalIn-vivo confocal microscopy
  6. d

    Replication Data for: A Common-Space Scaling of the American Judiciary and...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bonica, Adam (2023). Replication Data for: A Common-Space Scaling of the American Judiciary and Legal Profession [Dataset]. http://doi.org/10.7910/DVN/RPZLMY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Bonica, Adam
    Description

    This replication archive contains all data and code to replicate the results in "A Common-Space Scaling of the American Judiciary and Legal Profession" by Maya Sen and Adam Bonica. Abstract: We extend the scaling methodology previously used in Bonica (2014) to jointly scale the American federal judiciary and legal profession in a common-space with other political actors. The end result is the first data set of consistently measured ideological scores across all tiers of the federal judiciary and the legal profession, including 840 federal judges and 380,307 attorneys. To illustrate these measures, we present two examples involving the U.S. Supreme Court. These data open up significant areas of scholarly inquiry.

  7. d

    Replication Data for: Scaling Data from Multiple Sources

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc (2023). Replication Data for: Scaling Data from Multiple Sources [Dataset]. http://doi.org/10.7910/DVN/FOUVEL
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc
    Description

    We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives while recovering the words most associated with each senator's location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.

  8. H

    Data from: Social Dynamics of Short-Term Variability in Key Measures of...

    • dataverse.harvard.edu
    Updated Nov 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2018). Social Dynamics of Short-Term Variability in Key Measures of Household and Community Wellbeing in Rural Bangladesh [Dataset]. http://doi.org/10.7910/DVN/HBQQVE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 27, 2018
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVE

    Time period covered
    2016
    Area covered
    Bangladesh
    Dataset funded by
    Cereal Systems Initiative for South Asia (CSISA) of the Consultative Group on International Agricultural Research (CGIAR)
    United States Agency for International Development (USAID)
    Bill and Melinda Gates Foundation (BMGF)
    Description

    More frequent data collection, especially when coupled with shorter recall periods, may produce more inclusive reporting, improved capture of intra-seasonal variability, and earlier signals of events that may merit policy or other forms of development intervention. Although there have been survey efforts that have collected a small number of data from rural households on the moderately high basis, to date there have been no significant efforts to collect a broad range of data from rural households with high frequency. The data included in this study was collected through the smartphone-based data collection technique that allowed participants to submit data at various frequencies and with various recall periods, thereby permitting the analysis of the relative merits of more frequent data streams. This study captured data from 480 farmers of northwestern Bangladesh over approximately one year of continuous data on key measures of household and community well-being that could be particularly useful for the design and evaluation of development interventions and policies. While the data discussed here provide a snapshot of what is possible, we also highlight their strength for providing opportunities for interdisciplinary research in the household agricultural production, practices, seasonal hunger, etc., in a low-income agrarian society.

  9. H

    Data from: Common Bean variety releases in Africa

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muthoni R Andriatsitohaina; Resty Nagadya; D Okii; Innocent Obilil; Clare Mugisha Mukankusi; Rowland Chirwa; Rodah Morezio Zulu; Mercy Lungaho; C Ruranduma; M Ugen; T Kidane; D Karanja; Elisa Mazuma; Augustine Musoni; Lesole Sefume; Tsibingul Meshac; Manuel Amane; Deidre Fourie; A Dlamini; H Andriamazaoro; Micheal Kilango; O S Kweka; Bruce Mutari; Kennedy Muimui; James Asibuo; Martin Ngueguim (2019). Common Bean variety releases in Africa [Dataset]. http://doi.org/10.7910/DVN/RPATZA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 7, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Muthoni R Andriatsitohaina; Resty Nagadya; D Okii; Innocent Obilil; Clare Mugisha Mukankusi; Rowland Chirwa; Rodah Morezio Zulu; Mercy Lungaho; C Ruranduma; M Ugen; T Kidane; D Karanja; Elisa Mazuma; Augustine Musoni; Lesole Sefume; Tsibingul Meshac; Manuel Amane; Deidre Fourie; A Dlamini; H Andriamazaoro; Micheal Kilango; O S Kweka; Bruce Mutari; Kennedy Muimui; James Asibuo; Martin Ngueguim
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZAhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZA

    Time period covered
    Jan 2003 - Dec 2016
    Area covered
    Burundi, Swaziland, Zimbabwe, South Africa, Tanzania, United Republic of, Uganda, Malawi, Cameroon, Madagascar, Ghana
    Dataset funded by
    Swiss Development Corporation (SDC)
    Global Affairs Canada (GAC)
    Description

    The Pan Africa Bean Research Alliance is a network of national agricultural research centers (NARS), and private and public sector institutions that work to deliver better beans with consumer and market preferred traits to farmers. The datasets presented here draw from 17 Sub Saharan countries that are members of PABRA. The dataset on released bean varieties is a collection of 513 bean varieties released by NARS and there characteristics. The dataset on bean varieties and the relationship to constraints provides the 513 bean varieties on the basis of resistance to constraints such as fungal, bacterial, viral, diseases and tolerance to abiotic stresses. There is also a dataset of bean varieties that have been released in more than one country, useful for moving seed from one country to another and facilitating regional trade. The dataset on Niche market traits provides the market defined classifications for bean trade in Sub Saharan Africa as well as varieties that fall into these classifications. The datasets are an update to the 2011 discussion on PABRAs achievement in breeding and delivery of bean varieties in Buruchara et. 2011 in pages 236 and 237 here: http://www.ajol.info/index.php/acsj/article/view/74168 . It is also an update to a follow up to this discussion in Muthoni, R. A., Andrade, R. 2015 on the performance of bean improvement programmes in sub-Saharan Africa from the perspectives of varietal output and adoption in chapter 8. here: http://dx.doi.org/10.1079/9781780644011.0148. The data is extracted from the PABRA M&E database available here (http://database.pabra-africa.org/?location=breeding).

  10. H

    Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

    • dataverse.harvard.edu
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Backus; Christopher T Conlon; Michael Sinkinson (2020). Common Ownership Data: Scraped SEC form 13F filings for 1999-2017 [Dataset]. http://doi.org/10.7910/DVN/ZRH3EU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Matthew Backus; Christopher T Conlon; Michael Sinkinson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2017
    Description

    Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the...

  11. H

    Replication Data for "Is Craniofacial Morphology and Body Composition...

    • dataverse.harvard.edu
    Updated Jun 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudipta Ghosh (2021). Replication Data for "Is Craniofacial Morphology and Body Composition Related by Common Genes: Comparative Analysis of Two Ethnically Diverse Populations" [Dataset]. http://doi.org/10.7910/DVN/CNZHS9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Sudipta Ghosh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These are two pedigree based data set that was used to write a collaborative paper titled "Is Craniofacial Morphology and Body Composition Related by Common Genes: Comparative Analysis of Two Ethnically Diverse Populations"

  12. H

    Replication Data for: The Foreign Policy Attitudes of Indian Elites:...

    • dataverse.harvard.edu
    Updated Feb 26, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Ganguly; Timothy Hellwig; William R. Thompson (2016). Replication Data for: The Foreign Policy Attitudes of Indian Elites: Variance, Structure, and Common Denominators [Dataset]. http://doi.org/10.7910/DVN/BYZDYE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Sumit Ganguly; Timothy Hellwig; William R. Thompson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Foreign policy beliefs systems have received much attention. Yet nearly all work examines attitudes in western democracies, chiefly the United States. The current security environment, however, requires we ask whether the foreign policy views of individuals in other nations—particularly regional powers such as the BRICs—are similar in structure to those found in the U.S. case. This article does so for the Indian case. Drawing on studies of U.S. opinion, we develop a set of claims and test them on an original dataset on Indian elites. We make four contributions. First, we show that Wittkopf’s MICI framework applies to the Indian case. Second, we demonstrate how this framework can be made more generally applicable by revising its emphases on different types of internationalism and on rethinking the meaning of isolationist preferences. Third, we place the Indian case in comparative perspective. And lastly, we model the dimensions of Indian attitudes as a function of domestic ideology. Results of our analyses provide insights into the structure of foreign policy belief systems outside the Global North.

  13. H

    Aggregate State Legislator Shor-McCarty Ideology Data, July 2020 update

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 3, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boris Shor (2020). Aggregate State Legislator Shor-McCarty Ideology Data, July 2020 update [Dataset]. http://doi.org/10.7910/DVN/AP54NE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Boris Shor
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This version of the Shor-McCarty state legislative aggregate ideology data is being released as an update to the data underlying Shor and McCarty 2011. These are based on individual-level ideal point estimates described fully in that article. Estimates are all in Shor-McCarty NPAT common ideological space to facilitate explicit comparisons across time and between states. The data spans 1993 through 2018, with 2,268 chamber-years of data (compared with 2,025 in the previous release).

  14. H

    Replication Data for: A Non-parametric Bayesian Model for Detecting...

    • dataverse.harvard.edu
    Updated Dec 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuki Shiraito; James Lo; Santiago Olivella (2022). Replication Data for: A Non-parametric Bayesian Model for Detecting Differential Item Functioning: An Application to Political Representation in the US [Dataset]. http://doi.org/10.7910/DVN/BCDALU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Yuki Shiraito; James Lo; Santiago Olivella
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    A common approach when studying the quality of representation involves comparing the latent preferences of voters and legislators, commonly obtained by fitting an item-response theory (IRT) model to a common set of stimuli. Despite being exposed to the same stimuli, voters and legislators may not share a common understanding of how these stimuli map onto their latent preferences, leading to differential item-functioning (DIF) and incomparability of estimates. We explore the presence of DIF and incomparability of latent preferences obtained through IRT models by re-analyzing an influential survey data set, where survey respondents expressed their preferences on roll call votes that U.S. legislators had previously voted on. To do so, we propose defining a Dirichlet Process prior over item-response functions in standard IRT models. In contrast to typical multi-step approaches to detecting DIF, our strategy allows researchers to fit a single model, automatically identifying incomparable sub-groups with different mappings from latent traits onto observed responses. We find that although there is a group of voters whose estimated positions can be safely compared to those of legislators, a sizeable share of surveyed voters understand stimuli in fundamentally different ways. Ignoring these issues can lead to incorrect conclusions about the quality of representation.

  15. H

    A Popular Video Game For Education

    • dataverse.harvard.edu
    Updated May 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Levy Vidy; Levy Vidy (2023). A Popular Video Game For Education [Dataset]. http://doi.org/10.7910/DVN/YYNMTI
    Explore at:
    Dataset updated
    May 27, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Levy Vidy; Levy Vidy
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    LocoCraft Iron Armor is a popular video game that allows players to explore and build in a virtual world. One of the key features of the game is the ability to craft and wear armor to protect against enemy attacks. In this article, we will focus on the Iron Armor set in LocoCraft. Iron Armor is one of the most durable and protective armor sets in LocoCraft. It is made from Iron Ingots, which can be obtained by smelting Iron Ore in a furnace. To craft a full set of Iron Armor, you will need 24 Iron Ingots in total. The Iron Armor set consists of four pieces: the Iron Helmet, Iron Chestplate, Iron Leggings, and Iron Boots. Each piece provides varying levels of protection against enemy attacks. The Iron Helmet provides the least protection, while the Iron Chestplate provides the most. In addition to providing protection, the Iron Armor set also grants the player various bonuses.When wearing a full set of Iron Armor, the player will receive a 15% reduction in damage taken from enemy attacks. This makes the Iron Armor set ideal for players who want to explore dangerous areas or engage in combat with hostile mobs. To craft the Iron Armor set, you will need to arrange the Iron Ingots in a specific pattern on a crafting table. The pattern for each piece of armor is as follows: Iron Helmet: Place one Iron Ingot in each of the top three slots and one in the center slot. Iron Chestplate: Place two Iron Ingots in each of the top two rows and three in the bottom row. Iron Leggings: Place two Iron Ingots in each of the top two columns and one in the center column. Iron Boots: Place one Iron Ingot in each of the top two slots and one in the center slot of the bottom row. Once you have crafted all four pieces of Iron Armor, you can equip them by opening your inventory and placing them in the appropriate slots. You can also repair Iron Armor using additional Iron Ingots in an anvil. In conclusion, the Iron Armor set is a valuable asset for any LocoCraft player who wants to explore dangerous areas or engage in combat with hostile mobs. With its high durability and protective capabilities, the Iron Armor set is a must-have for any serious adventurer.

  16. H

    Common bean climate niche of Southeastern and Southern Africa

    • dataverse.harvard.edu
    • search.dataone.org
    tiff, txt
    Updated Mar 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). Common bean climate niche of Southeastern and Southern Africa [Dataset]. http://doi.org/10.7910/DVN/FZYU2S
    Explore at:
    txt(1576), tiff(6625)Available download formats
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Harvard Dataverse
    Area covered
    Africa, Southern Africa
    Dataset funded by
    United States Agency for International Developmenthttp://usaid.gov/
    Description

    Common bean climate niche of Southeastern and Southern Africa Geospatial dataset of the climate niche for common bean in Southeastern Africa. Temperature and precipitation parameters collected from Beebe et al. (2011). Data sources: NASA MODIS Land Surface Temperature (MOD11A2) (NASA LP DAAC 2015; Wan et al. 2015) and CHIRPS Precipitation (Funk et al. 2015). Growing season months: November–April; temporal range: 2001–2017; precipitation range: 200–710 mm; temperature range: 13.6–25.6°C. Categories 0 - Non-agriculture 1 - Pessimal 2 - Unsuitable 3 - Marginal 4 - Suitable 5 - Optimal NASA MODIS Land Surface Temperature (LST) data NASA LP DAAC, 2015. MODIS Land Surface Temperature (MOD11A2) Version 005. NASA EOSDIS Land Processes DAAC, USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota. Wan, Z., Hook, S., Hulley, G. (2015). MOD11A2 MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid V006 [Data set]. NASA EOSDIS Land Processes DAAC. Accessed 2020-02-26 from https://doi.org/10.5067/MODIS/MOD11A2.006 CHIRPS precipitation data Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A. and Michaelsen, J., 2015. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific Data, 2, p.150066. Common bean temperature and precipitation parameters Beebe, S., Ramirez, J., Jarvis, A., Rao, I.M., Mosquera, G., Bueno, J.M. and Blair, M.W., 2011. Genetic improvement of common beans and the challenges of climate change. Crop Adaptation to Climate Change, 26, pp.356-369. Classification methodology Peter, B.G., Mungai, L.M., Messina, J.P. and Snapp, S.S., 2017. Nature-based agricultural solutions: Scaling perennial grains across Africa. Environmental Research, 159, pp.283-290. This content is made possible by the support of the American People provided to the Feed the Future Innovation Lab for Sustainable Intensification through the United States Agency for International Development (USAID). The contents are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. Program activities are funded by USAID under Cooperative Agreement No. AID-OAA-L-14-00006.

  17. H

    Replication Data for: "Do Nonpartisan Programmatic Policies Generate...

    • dataverse.harvard.edu
    rtf, tar
    Updated Jan 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2019). Replication Data for: "Do Nonpartisan Programmatic Policies Generate Partisan Electoral Effects? Evidence from Two Large Scale Experiments" [Dataset]. http://doi.org/10.7910/DVN/70SNIS
    Explore at:
    tar(1135463936), tar(8704), tar(1347706880), rtf(1410), tar(11264), tar(28160)Available download formats
    Dataset updated
    Jan 7, 2019
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These files replicate all the results in Kosuke Imai, Gary King, and Carlos Velasco Rivera "Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments." To replicate all the analyses reported in the main manuscript and supplementary appendix, simply follow the next steps: 0. create a folder in your local computer (e.g., programmatic) 1. Download all the files to the directory created in step 0 2. Untar all the .tar files 3. Set the working directory to replicate 4. In the command line run 4.1 Rscript required-packages.R 4.2 Rscript replicate-sps.R 4.3 Rscript replicate-progresa.R 4.4 Rscript replicate-additional-tests.R Together, these scripts dump all the paper figures and tables in the figures and tables directories. For convenience, the figures directory has two sub-directories for the figures in the paper and in the supplementary appendix (main-figures and online-appendix). The names of all tables and figures follow the order in the paper.

  18. H

    Data from: Fado, Urban Popular Song and Intangible Heritage: Perceptions of...

    • dataverse.harvard.edu
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anabela Monteiro (2023). Fado, Urban Popular Song and Intangible Heritage: Perceptions of Authenticity and Emotions in TripAdvisor Reviews [Dataset]. http://doi.org/10.7910/DVN/UFNZBM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Anabela Monteiro
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This study concerns four Fado venues in Lisbon (three Fado houses and one theatre with fado show). 2653 TripAdvisor reviews (corresponding to 234,059 words) were collected and analyzed. We gathered all available reviews for each establishment at the time of data collection. The choice of Fado venues was determined by four criteria: i) location in the most touristic quarters of Lisbon (Alfama, Chiado and Bairro Alto), ii) prestige of the fado show, iii) scope of fado experience (in fado houses and theater) and iv) the classification on TripAdvisor, the platform where customer reviews were collected.

  19. H

    Replication Data for: Antisemitic Attitudes Across the Ideological Spectrum

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eitan D Hersh; Laura Royden (2022). Replication Data for: Antisemitic Attitudes Across the Ideological Spectrum [Dataset]. http://doi.org/10.7910/DVN/CJPTXK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Eitan D Hersh; Laura Royden
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Concern about antisemitism in the U.S. has grown following recent rises in deadly assaults, vandalism, and harassment. Public accounts of antisemitism have focused on both the ideological right and left, suggesting a “horseshoe theory” in which the far left and the far right hold a common set of anti-Jewish prejudicial attitudes that dis¬tinguish them from the ideological center. However, there is little quantitative research evaluating left-wing versus right-wing antisemitism. We conduct several experiments on an original survey of 3,500 U.S. adults, including an oversample of young adults. We oversampled young adults because unlike other forms of prejudice that are more common among older people, antisemitism is theorized to be more common among younger people. Contrary to the expectation of horseshoe theory, the data show the epicenter of antisemitic attitudes is young adults on the far right.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Office of Institutional Research (2023). Harvard Common Data Set [Dataset]. http://doi.org/10.7910/DVN/AOD2ZV

Harvard Common Data Set

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Office of Institutional Research
Description

This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.

Search
Clear search
Close search
Google apps
Main menu