19 datasets found

d
Harvard Common Data Set
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Institutional Research (2023). Harvard Common Data Set [Dataset]. http://doi.org/10.7910/DVN/AOD2ZV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AOD2ZV
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Office of Institutional Research
Description
This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.
H
Cooperative Election Study Common Content, 2020
dataverse.harvard.edu
search.dataone.org
Updated Feb 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Schaffner; Stephen Ansolabehere; Sam Luks (2022). Cooperative Election Study Common Content, 2020 [Dataset]. http://doi.org/10.7910/DVN/E9N6PH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/E9N6PH
Dataset updated
Feb 14, 2022
Dataset provided by
Harvard Dataverse
Authors
Brian Schaffner; Stephen Ansolabehere; Sam Luks
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is the final release of the 2020 CES Common Content Dataset. The data includes a nationally representative sample of 61,000 American adults. This release includes the data from the survey, a full guide to the data, and the questionnaires. The dataset includes vote validation performed by Catalist. Please consult the guide and the study website (https://cces.gov.harvard.edu/frequently-asked-questions) if you have questions about the study. Special thanks to Marissa Shih and Rebecca Phillips for their work in preparing this data for release.
e
Harvard Forest - United States of America - Dataset - B2FIND
b2find.eudat.eu
Updated Aug 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Harvard Forest - United States of America - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2eafdd4a-1d25-548c-9586-9d99f55ef6e9
Explore at:
Dataset updated
Aug 10, 2016
Area covered
United States
Description
The Harvard Forest is a collection of five properties, totaling about 1500 hectares, in Petersham, Massachusetts. Petersham is a rural town in Worcester County, Massachusetts, about 60 miles west of Boston. It is largely in the Swift River Watershed, and lies near the center of a twenty-mile wide band of hilly uplands that form the eastern edge of the Connecticut Valley. The north part of the town is rolling and the south more distinctly hilly; the lowest basins are about 200 m above sea level, the flats around 400m. Th e climate is cool temperate. Petersham, like many of the adjacent towns, was settled in the early 18th century, extensively cleared and farmed in the next hundred years, and then progressively abandoned after about 1830. Reforestation proceeded quickly, and by the time of the first Harvard Forest maps in 1909 HF was almost entirely wooded. Th e common forest types are dominated, variously, by red oak, red maple, white pine, or hemlock. Most are of low or average fertility and under 100 years old. Hemlock is now locally dominant in many stands that have been continuously forested; oaks, red maples and pines are the common dominants in stands that developed in old fields.
Replication data for: Logistic Regression in Rare Events Data
search.datacite.org
Updated 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gary King (2010). Replication data for: Logistic Regression in Rare Events Data [Dataset]. http://doi.org/10.7910/dvn/spafjk
Explore at:
Unique identifier
https://doi.org/10.7910/dvn/spafjk
Dataset updated
2010
Dataset provided by
DataCitehttps://www.datacite.org/
Harvard Dataverse
Authors
Gary King
Description
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

Skin Cancer - The HAM10000 dataset

kaggle.com

Updated Jul 1, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Élio Cordeiro Pereira (2024). Skin Cancer - The HAM10000 dataset [Dataset]. https://www.kaggle.com/datasets/eliocordeiropereira/skin-cancer-the-ham10000-dataset/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 1, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Élio Cordeiro Pereira

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

The Original Dataset

The source dataset and its full description may be accessed through the Harvard Dataverse, and should be cited as

Tschandl, Philipp, 2018, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions", https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V4, UNF:6:KCZFcBLiFE5ObWcTc2ZBOA== [fileUNF]

The Current Dataset

Note that the herein uploaded dataset does not contain all of the source material, namely the file ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.tab - which contains data on a study involving human-computer collaboration - and the folder HAM10000_segmentations_lesion_tschandl - containing binary segmentation masks of the training images. Still, in contrast to most of the HAM10000 datasets published in Kaggle, the current one includes the test dataset that was curated for the ISIC 2018 challenge (Task 3).

Description

Files and folders

The uploaded dataset is comprised by 3 folders and 2 files, described in the table below.

Content	Type	Description
`HAM10000_images_part_1`	folder	Part 1 of a set of training pictures
`HAM10000_images_part_2`	folder	Part 2 of a set of training pictures
`ISIC2018_Task3_Test_Images`	folder	Set of test pictures
`HAM10000_metadata.csv`	file	Metadata associated with the training data
`ISIC2018_Task3_Test_GroundTruth.csv`	file	Metadata associated with the test data

The training dataset (HAM10000_images_part_1 and HAM10000_images_part_2) is called "HAM10000" meaning "Human Against Machine with 10000 training images"" (actually 10015 images) and it corresponds to a large collection of multi-source dermatoscopic RGB images (JPG) of common pigmented skin lesions. The test dataset (ISIC2018_Task3_Test_Images) corresponds to 511 images. The files HAM10000_metadata.csv and ISIC2018_Task3_Test_GroundTruth.csv contain the respective metadata (data about the data) which further include other features and the labels.

Columns of the metadata files

Their structure of the metadata files follows the template presented by the table below.

Column	Type	Description
`lesion_id`	String	ID of the lesion case
`image_id`	String	ID of an image (also the name of the respective JPG file) associated with that case
`dx`	String	Label of that case
`dx_type`	String	Method used for diagnosing that case
`age`	Float	Age of the person associated with that case
`sex`	String	Sex of the person associated with that case
`localization`	String	Location of the lesion in the person body
`dataset`	String	Reference from which the data was taken

Values of the metadata `dx` column (the classes)

The values that the column dx may take are tabulated below.

Value	Description
`akiec`	Actinic keratoses and intraepithelial carcinoma (also called "Bowen's disease") - an early form of skin cancer
`bcc`	Basal cell carcinoma - the most common type of skin cancer
`bkl`	Benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses) - common and benign
`df`	Dermatofibroma - common and benign
`mel`	Melanoma - a type of skin cancer involving the melanin cells
`nv`	Melanocytic nevus - the medical term for a mole (benign)
`vasc`	Vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage) (benign)

Values of the metadata `dx_type` column (the diagnosis methods)

And the table below present the values of the column dx_type.

Value	Description
`histo`	Histopathology
`follow_up`	Follow-up examination
`consensus`	Expert consensus
`confocal`	In-vivo confocal microscopy

d
Replication Data for: A Common-Space Scaling of the American Judiciary and...
dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bonica, Adam (2023). Replication Data for: A Common-Space Scaling of the American Judiciary and Legal Profession [Dataset]. http://doi.org/10.7910/DVN/RPZLMY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RPZLMY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Bonica, Adam
Description
This replication archive contains all data and code to replicate the results in "A Common-Space Scaling of the American Judiciary and Legal Profession" by Maya Sen and Adam Bonica. Abstract: We extend the scaling methodology previously used in Bonica (2014) to jointly scale the American federal judiciary and legal profession in a common-space with other political actors. The end result is the first data set of consistently measured ideological scores across all tiers of the federal judiciary and the legal profession, including 840 federal judges and 380,307 attorneys. To illustrate these measures, we present two examples involving the U.S. Supreme Court. These data open up significant areas of scholarly inquiry.
d
Replication Data for: Scaling Data from Multiple Sources
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc (2023). Replication Data for: Scaling Data from Multiple Sources [Dataset]. http://doi.org/10.7910/DVN/FOUVEL
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FOUVEL
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc
Description
We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives while recovering the words most associated with each senator's location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.
H
Data from: Social Dynamics of Short-Term Variability in Key Measures of...
dataverse.harvard.edu
Updated Nov 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2018). Social Dynamics of Short-Term Variability in Key Measures of Household and Community Wellbeing in Rural Bangladesh [Dataset]. http://doi.org/10.7910/DVN/HBQQVE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/HBQQVE
Dataset updated
Nov 27, 2018
Dataset provided by
Harvard Dataverse
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/HBQQVE
Time period covered
2016
Area covered
Bangladesh
Dataset funded by
Cereal Systems Initiative for South Asia (CSISA) of the Consultative Group on International Agricultural Research (CGIAR)
United States Agency for International Development (USAID)
Bill and Melinda Gates Foundation (BMGF)
Description
More frequent data collection, especially when coupled with shorter recall periods, may produce more inclusive reporting, improved capture of intra-seasonal variability, and earlier signals of events that may merit policy or other forms of development intervention. Although there have been survey efforts that have collected a small number of data from rural households on the moderately high basis, to date there have been no significant efforts to collect a broad range of data from rural households with high frequency. The data included in this study was collected through the smartphone-based data collection technique that allowed participants to submit data at various frequencies and with various recall periods, thereby permitting the analysis of the relative merits of more frequent data streams. This study captured data from 480 farmers of northwestern Bangladesh over approximately one year of continuous data on key measures of household and community well-being that could be particularly useful for the design and evaluation of development interventions and policies. While the data discussed here provide a snapshot of what is possible, we also highlight their strength for providing opportunities for interdisciplinary research in the household agricultural production, practices, seasonal hunger, etc., in a low-income agrarian society.
H
Data from: Common Bean variety releases in Africa
dataverse.harvard.edu
search.dataone.org
Updated Jun 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muthoni R Andriatsitohaina; Resty Nagadya; D Okii; Innocent Obilil; Clare Mugisha Mukankusi; Rowland Chirwa; Rodah Morezio Zulu; Mercy Lungaho; C Ruranduma; M Ugen; T Kidane; D Karanja; Elisa Mazuma; Augustine Musoni; Lesole Sefume; Tsibingul Meshac; Manuel Amane; Deidre Fourie; A Dlamini; H Andriamazaoro; Micheal Kilango; O S Kweka; Bruce Mutari; Kennedy Muimui; James Asibuo; Martin Ngueguim (2019). Common Bean variety releases in Africa [Dataset]. http://doi.org/10.7910/DVN/RPATZA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RPATZA
Dataset updated
Jun 7, 2019
Dataset provided by
Harvard Dataverse
Authors
Muthoni R Andriatsitohaina; Resty Nagadya; D Okii; Innocent Obilil; Clare Mugisha Mukankusi; Rowland Chirwa; Rodah Morezio Zulu; Mercy Lungaho; C Ruranduma; M Ugen; T Kidane; D Karanja; Elisa Mazuma; Augustine Musoni; Lesole Sefume; Tsibingul Meshac; Manuel Amane; Deidre Fourie; A Dlamini; H Andriamazaoro; Micheal Kilango; O S Kweka; Bruce Mutari; Kennedy Muimui; James Asibuo; Martin Ngueguim
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZAhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.2/customlicense?persistentId=doi:10.7910/DVN/RPATZA
Time period covered
Jan 2003 - Dec 2016
Area covered
Burundi, Swaziland, Zimbabwe, South Africa, Tanzania, United Republic of, Uganda, Malawi, Cameroon, Madagascar, Ghana
Dataset funded by
Swiss Development Corporation (SDC)
Global Affairs Canada (GAC)
Description
The Pan Africa Bean Research Alliance is a network of national agricultural research centers (NARS), and private and public sector institutions that work to deliver better beans with consumer and market preferred traits to farmers. The datasets presented here draw from 17 Sub Saharan countries that are members of PABRA. The dataset on released bean varieties is a collection of 513 bean varieties released by NARS and there characteristics. The dataset on bean varieties and the relationship to constraints provides the 513 bean varieties on the basis of resistance to constraints such as fungal, bacterial, viral, diseases and tolerance to abiotic stresses. There is also a dataset of bean varieties that have been released in more than one country, useful for moving seed from one country to another and facilitating regional trade. The dataset on Niche market traits provides the market defined classifications for bean trade in Sub Saharan Africa as well as varieties that fall into these classifications. The datasets are an update to the 2011 discussion on PABRAs achievement in breeding and delivery of bean varieties in Buruchara et. 2011 in pages 236 and 237 here: http://www.ajol.info/index.php/acsj/article/view/74168 . It is also an update to a follow up to this discussion in Muthoni, R. A., Andrade, R. 2015 on the performance of bean improvement programmes in sub-Saharan Africa from the perspectives of varietal output and adoption in chapter 8. here: http://dx.doi.org/10.1079/9781780644011.0148. The data is extracted from the PABRA M&E database available here (http://database.pabra-africa.org/?location=breeding).
H
Common Ownership Data: Scraped SEC form 13F filings for 1999-2017
dataverse.harvard.edu
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Backus; Christopher T Conlon; Michael Sinkinson (2020). Common Ownership Data: Scraped SEC form 13F filings for 1999-2017 [Dataset]. http://doi.org/10.7910/DVN/ZRH3EU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZRH3EU
Dataset updated
Aug 17, 2020
Dataset provided by
Harvard Dataverse
Authors
Matthew Backus; Christopher T Conlon; Michael Sinkinson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2017
Description
Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the...
H
Replication Data for "Is Craniofacial Morphology and Body Composition...
dataverse.harvard.edu
Updated Jun 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudipta Ghosh (2021). Replication Data for "Is Craniofacial Morphology and Body Composition Related by Common Genes: Comparative Analysis of Two Ethnically Diverse Populations" [Dataset]. http://doi.org/10.7910/DVN/CNZHS9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CNZHS9
Dataset updated
Jun 29, 2021
Dataset provided by
Harvard Dataverse
Authors
Sudipta Ghosh
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These are two pedigree based data set that was used to write a collaborative paper titled "Is Craniofacial Morphology and Body Composition Related by Common Genes: Comparative Analysis of Two Ethnically Diverse Populations"
H
Replication Data for: The Foreign Policy Attitudes of Indian Elites:...
dataverse.harvard.edu
Updated Feb 26, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumit Ganguly; Timothy Hellwig; William R. Thompson (2016). Replication Data for: The Foreign Policy Attitudes of Indian Elites: Variance, Structure, and Common Denominators [Dataset]. http://doi.org/10.7910/DVN/BYZDYE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BYZDYE
Dataset updated
Feb 26, 2016
Dataset provided by
Harvard Dataverse
Authors
Sumit Ganguly; Timothy Hellwig; William R. Thompson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Foreign policy beliefs systems have received much attention. Yet nearly all work examines attitudes in western democracies, chiefly the United States. The current security environment, however, requires we ask whether the foreign policy views of individuals in other nations—particularly regional powers such as the BRICs—are similar in structure to those found in the U.S. case. This article does so for the Indian case. Drawing on studies of U.S. opinion, we develop a set of claims and test them on an original dataset on Indian elites. We make four contributions. First, we show that Wittkopf’s MICI framework applies to the Indian case. Second, we demonstrate how this framework can be made more generally applicable by revising its emphases on different types of internationalism and on rethinking the meaning of isolationist preferences. Third, we place the Indian case in comparative perspective. And lastly, we model the dimensions of Indian attitudes as a function of domestic ideology. Results of our analyses provide insights into the structure of foreign policy belief systems outside the Global North.
H
Aggregate State Legislator Shor-McCarty Ideology Data, July 2020 update
dataverse.harvard.edu
search.dataone.org
Updated Jul 3, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boris Shor (2020). Aggregate State Legislator Shor-McCarty Ideology Data, July 2020 update [Dataset]. http://doi.org/10.7910/DVN/AP54NE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/AP54NE
Dataset updated
Jul 3, 2020
Dataset provided by
Harvard Dataverse
Authors
Boris Shor
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This version of the Shor-McCarty state legislative aggregate ideology data is being released as an update to the data underlying Shor and McCarty 2011. These are based on individual-level ideal point estimates described fully in that article. Estimates are all in Shor-McCarty NPAT common ideological space to facilitate explicit comparisons across time and between states. The data spans 1993 through 2018, with 2,268 chamber-years of data (compared with 2,025 in the previous release).
H
Replication Data for: A Non-parametric Bayesian Model for Detecting...
dataverse.harvard.edu
Updated Dec 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuki Shiraito; James Lo; Santiago Olivella (2022). Replication Data for: A Non-parametric Bayesian Model for Detecting Differential Item Functioning: An Application to Political Representation in the US [Dataset]. http://doi.org/10.7910/DVN/BCDALU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BCDALU
Dataset updated
Dec 27, 2022
Dataset provided by
Harvard Dataverse
Authors
Yuki Shiraito; James Lo; Santiago Olivella
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
A common approach when studying the quality of representation involves comparing the latent preferences of voters and legislators, commonly obtained by fitting an item-response theory (IRT) model to a common set of stimuli. Despite being exposed to the same stimuli, voters and legislators may not share a common understanding of how these stimuli map onto their latent preferences, leading to differential item-functioning (DIF) and incomparability of estimates. We explore the presence of DIF and incomparability of latent preferences obtained through IRT models by re-analyzing an influential survey data set, where survey respondents expressed their preferences on roll call votes that U.S. legislators had previously voted on. To do so, we propose defining a Dirichlet Process prior over item-response functions in standard IRT models. In contrast to typical multi-step approaches to detecting DIF, our strategy allows researchers to fit a single model, automatically identifying incomparable sub-groups with different mappings from latent traits onto observed responses. We find that although there is a group of voters whose estimated positions can be safely compared to those of legislators, a sizeable share of surveyed voters understand stimuli in fundamentally different ways. Ignoring these issues can lead to incorrect conclusions about the quality of representation.
H
A Popular Video Game For Education
dataverse.harvard.edu
Updated May 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Levy Vidy; Levy Vidy (2023). A Popular Video Game For Education [Dataset]. http://doi.org/10.7910/DVN/YYNMTI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YYNMTI
Dataset updated
May 27, 2023
Dataset provided by
Harvard Dataverse
Authors
Levy Vidy; Levy Vidy
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
LocoCraft Iron Armor is a popular video game that allows players to explore and build in a virtual world. One of the key features of the game is the ability to craft and wear armor to protect against enemy attacks. In this article, we will focus on the Iron Armor set in LocoCraft. Iron Armor is one of the most durable and protective armor sets in LocoCraft. It is made from Iron Ingots, which can be obtained by smelting Iron Ore in a furnace. To craft a full set of Iron Armor, you will need 24 Iron Ingots in total. The Iron Armor set consists of four pieces: the Iron Helmet, Iron Chestplate, Iron Leggings, and Iron Boots. Each piece provides varying levels of protection against enemy attacks. The Iron Helmet provides the least protection, while the Iron Chestplate provides the most. In addition to providing protection, the Iron Armor set also grants the player various bonuses.When wearing a full set of Iron Armor, the player will receive a 15% reduction in damage taken from enemy attacks. This makes the Iron Armor set ideal for players who want to explore dangerous areas or engage in combat with hostile mobs. To craft the Iron Armor set, you will need to arrange the Iron Ingots in a specific pattern on a crafting table. The pattern for each piece of armor is as follows: Iron Helmet: Place one Iron Ingot in each of the top three slots and one in the center slot. Iron Chestplate: Place two Iron Ingots in each of the top two rows and three in the bottom row. Iron Leggings: Place two Iron Ingots in each of the top two columns and one in the center column. Iron Boots: Place one Iron Ingot in each of the top two slots and one in the center slot of the bottom row. Once you have crafted all four pieces of Iron Armor, you can equip them by opening your inventory and placing them in the appropriate slots. You can also repair Iron Armor using additional Iron Ingots in an anvil. In conclusion, the Iron Armor set is a valuable asset for any LocoCraft player who wants to explore dangerous areas or engage in combat with hostile mobs. With its high durability and protective capabilities, the Iron Armor set is a must-have for any serious adventurer.
H
Common bean climate niche of Southeastern and Southern Africa
dataverse.harvard.edu
search.dataone.org
tiff, txt
Updated Mar 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2020). Common bean climate niche of Southeastern and Southern Africa [Dataset]. http://doi.org/10.7910/DVN/FZYU2S
Explore at:
txt(1576), tiff(6625)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/FZYU2S
Dataset updated
Mar 4, 2020
Dataset provided by
Harvard Dataverse
Area covered
Africa, Southern Africa
Dataset funded by
United States Agency for International Developmenthttp://usaid.gov/
Description
Common bean climate niche of Southeastern and Southern Africa Geospatial dataset of the climate niche for common bean in Southeastern Africa. Temperature and precipitation parameters collected from Beebe et al. (2011). Data sources: NASA MODIS Land Surface Temperature (MOD11A2) (NASA LP DAAC 2015; Wan et al. 2015) and CHIRPS Precipitation (Funk et al. 2015). Growing season months: November–April; temporal range: 2001–2017; precipitation range: 200–710 mm; temperature range: 13.6–25.6°C. Categories 0 - Non-agriculture 1 - Pessimal 2 - Unsuitable 3 - Marginal 4 - Suitable 5 - Optimal NASA MODIS Land Surface Temperature (LST) data NASA LP DAAC, 2015. MODIS Land Surface Temperature (MOD11A2) Version 005. NASA EOSDIS Land Processes DAAC, USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota. Wan, Z., Hook, S., Hulley, G. (2015). MOD11A2 MODIS/Terra Land Surface Temperature/Emissivity 8-Day L3 Global 1km SIN Grid V006 [Data set]. NASA EOSDIS Land Processes DAAC. Accessed 2020-02-26 from https://doi.org/10.5067/MODIS/MOD11A2.006 CHIRPS precipitation data Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A. and Michaelsen, J., 2015. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific Data, 2, p.150066. Common bean temperature and precipitation parameters Beebe, S., Ramirez, J., Jarvis, A., Rao, I.M., Mosquera, G., Bueno, J.M. and Blair, M.W., 2011. Genetic improvement of common beans and the challenges of climate change. Crop Adaptation to Climate Change, 26, pp.356-369. Classification methodology Peter, B.G., Mungai, L.M., Messina, J.P. and Snapp, S.S., 2017. Nature-based agricultural solutions: Scaling perennial grains across Africa. Environmental Research, 159, pp.283-290. This content is made possible by the support of the American People provided to the Feed the Future Innovation Lab for Sustainable Intensification through the United States Agency for International Development (USAID). The contents are the sole responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government. Program activities are funded by USAID under Cooperative Agreement No. AID-OAA-L-14-00006.
H
Replication Data for: "Do Nonpartisan Programmatic Policies Generate...
dataverse.harvard.edu
rtf, tar
Updated Jan 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2019). Replication Data for: "Do Nonpartisan Programmatic Policies Generate Partisan Electoral Effects? Evidence from Two Large Scale Experiments" [Dataset]. http://doi.org/10.7910/DVN/70SNIS
Explore at:
tar(1135463936), tar(8704), tar(1347706880), rtf(1410), tar(11264), tar(28160)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/70SNIS
Dataset updated
Jan 7, 2019
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These files replicate all the results in Kosuke Imai, Gary King, and Carlos Velasco Rivera "Do Nonpartisan Programmatic Policies Have Partisan Electoral Effects? Evidence from Two Large Scale Experiments." To replicate all the analyses reported in the main manuscript and supplementary appendix, simply follow the next steps: 0. create a folder in your local computer (e.g., programmatic) 1. Download all the files to the directory created in step 0 2. Untar all the .tar files 3. Set the working directory to replicate 4. In the command line run 4.1 Rscript required-packages.R 4.2 Rscript replicate-sps.R 4.3 Rscript replicate-progresa.R 4.4 Rscript replicate-additional-tests.R Together, these scripts dump all the paper figures and tables in the figures and tables directories. For convenience, the figures directory has two sub-directories for the figures in the paper and in the supplementary appendix (main-figures and online-appendix). The names of all tables and figures follow the order in the paper.
H
Data from: Fado, Urban Popular Song and Intangible Heritage: Perceptions of...
dataverse.harvard.edu
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anabela Monteiro (2023). Fado, Urban Popular Song and Intangible Heritage: Perceptions of Authenticity and Emotions in TripAdvisor Reviews [Dataset]. http://doi.org/10.7910/DVN/UFNZBM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UFNZBM
Dataset updated
Jul 13, 2023
Dataset provided by
Harvard Dataverse
Authors
Anabela Monteiro
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This study concerns four Fado venues in Lisbon (three Fado houses and one theatre with fado show). 2653 TripAdvisor reviews (corresponding to 234,059 words) were collected and analyzed. We gathered all available reviews for each establishment at the time of data collection. The choice of Fado venues was determined by four criteria: i) location in the most touristic quarters of Lisbon (Alfama, Chiado and Bairro Alto), ii) prestige of the fado show, iii) scope of fado experience (in fado houses and theater) and iv) the classification on TripAdvisor, the platform where customer reviews were collected.
H
Replication Data for: Antisemitic Attitudes Across the Ideological Spectrum
dataverse.harvard.edu
search.dataone.org
Updated Jun 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eitan D Hersh; Laura Royden (2022). Replication Data for: Antisemitic Attitudes Across the Ideological Spectrum [Dataset]. http://doi.org/10.7910/DVN/CJPTXK
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CJPTXK
Dataset updated
Jun 15, 2022
Dataset provided by
Harvard Dataverse
Authors
Eitan D Hersh; Laura Royden
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Concern about antisemitism in the U.S. has grown following recent rises in deadly assaults, vandalism, and harassment. Public accounts of antisemitism have focused on both the ideological right and left, suggesting a “horseshoe theory” in which the far left and the far right hold a common set of anti-Jewish prejudicial attitudes that dis¬tinguish them from the ideological center. However, there is little quantitative research evaluating left-wing versus right-wing antisemitism. We conduct several experiments on an original survey of 3,500 U.S. adults, including an oversample of young adults. We oversampled young adults because unlike other forms of prejudice that are more common among older people, antisemitism is theorized to be more common among younger people. Contrary to the expectation of horseshoe theory, the data show the epicenter of antisemitic attitudes is young adults on the far right.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Office of Institutional Research (2023). Harvard Common Data Set [Dataset]. http://doi.org/10.7910/DVN/AOD2ZV

Harvard Common Data Set

Explore at:

Unique identifier

https://doi.org/10.7910/DVN/AOD2ZV

Dataset updated

Nov 21, 2023

Dataset provided by

Harvard Dataverse

Authors

Office of Institutional Research

Description

This represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.

Clear search

Close search

Google apps

Main menu

Harvard Common Data Set

Cooperative Election Study Common Content, 2020

Harvard Forest - United States of America - Dataset - B2FIND

Replication data for: Logistic Regression in Rare Events Data

Skin Cancer - The HAM10000 dataset

The Original Dataset

The Current Dataset

Description

Files and folders

Columns of the metadata files

Values of the metadata dx column (the classes)

Values of the metadata dx_type column (the diagnosis methods)

Replication Data for: A Common-Space Scaling of the American Judiciary and...

Replication Data for: Scaling Data from Multiple Sources

Data from: Social Dynamics of Short-Term Variability in Key Measures of...

Data from: Common Bean variety releases in Africa

Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

Replication Data for "Is Craniofacial Morphology and Body Composition...

Replication Data for: The Foreign Policy Attitudes of Indian Elites:...

Aggregate State Legislator Shor-McCarty Ideology Data, July 2020 update

Replication Data for: A Non-parametric Bayesian Model for Detecting...

A Popular Video Game For Education

Common bean climate niche of Southeastern and Southern Africa

Replication Data for: "Do Nonpartisan Programmatic Policies Generate...

Data from: Fado, Urban Popular Song and Intangible Heritage: Perceptions of...

Replication Data for: Antisemitic Attitudes Across the Ideological Spectrum

Harvard Common Data Set

Values of the metadata `dx` column (the classes)

Values of the metadata `dx_type` column (the diagnosis methods)