Facebook
TwitterThe Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University:Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH).
Facebook
Twitterhttps://github.com/bdsp-core/bdsp-license-and-duahttps://github.com/bdsp-core/bdsp-license-and-dua
The Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investigators.
In version 1.0 of the database, these ECGs from Massachusetts General Brigham hospital sites were provided without labels or metadata, to enable pre-training of ECG analysis models.
In version 2.0, metadata is included.
In version 3.0, Emory ECGs are included together with metadata, labels from the 12SL ECG analysis program (GE Healthcare ) and ICD-9/10 codes.
In version 4.0, typos were corrected in the data description.
HEEDB is published as part of the Human Sleep Project (HSP), funded by a grant (R01HL161253) from the National Heart Lung and Blood Institute (NHLBI) of the NIH to Massachusetts General Hospital, Emory University, Stanford University, Kaiser Permanente, Boston Children's Hospital, and Beth Israel Deaconess Medical Center.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The International Authority Database (IAD) informs about the degree of authority of 34 International Organizations from 1919 to 2013. Our cross-sectional time-series data count 1694 observations and offer systematic information on the exercise of authority across seven policy functions: agenda setting, rule making, compliance monitoring, norm interpretation and dispute settlement, enforcement, knowledge generation, and institutional evaluation.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The "Harvard University Ratings and Reviews" dataset presents a rich compilation of experiences from one of the most esteemed institutions globally. It uniquely encompasses a broad spectrum of perspectives, including in-depth academic evaluations and impressions from travelers intrigued by Harvard's historical and architectural significance. This dataset serves as a bridge, connecting the academic excellence of Harvard with the experiences of visitors who come to admire its iconic campus.
This dataset has been ethically curated, with careful consideration to exclude any personal identifiers. By focusing purely on the content of the reviews, it respects privacy while still offering valuable insights.
We extend our gratitude to TripAdvisor for providing a platform that captures such a diverse range of experiences and to Harvard University for being the subject of this intriguing dataset. Their contributions enrich our understanding of academic and visitor perceptions alike.
The dataset's thumbnail, featuring an iconic view of Harvard University, has been sourced from AdmissionSight. This image captures the essence of Harvard's sprawling campus, inviting further exploration through the reviews within this dataset.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0
One of the obstacles in applying advanced crop simulation models such as DSSAT at a grid-based platform is the lack of gridded soil input data at various resolutions. Recently, there has been many efforts in scientific communities to develop spatially continuous soil database across the globe. The most representative example is the SoilGrids 1km released by ISRIC in 2014. In addition recent AfSIS project put a lot of efforts to develop more accurate soil database in Africa at high spatial resolution. Taking advantage of those two available high resolution soil databases (SoilGrids 1km and ISRIC-AfSIS at 1km resolution), this project aims to develop a set of DSSAT compatible soil profiles on 5 arc-minute grid (which is HarvestChoice’s standard grid). Six soil properties (bulk density, organic carbon, percentage of clay and silt, soil pH and cation exchange capacity) available from the original SoilGrids 1km or ISRIC-AfSIS were directly used as DSSAT inputs. We applied a pedo-transfer function to derive some soil hydraulic properties (saturated hydraulic conductivity, soil water content at field capacity, wilting point and saturation) which are critical to simulate crop growth. For other required variables, HarvestChoice’s HC27 database are used as a reference. Final outputs are provided in *.SOL file format (DSSAT soil database) for each country at 5-min resolution. In addition, uncertainty maps for organic carbon and soil water content at wilting points at the top 15 cm soil layers were generated to provide brief idea about accuracy of the final products. The generated soil properties were evaluated by visualizing their global maps and by comparing them with IIASA-IFPRI cropland map and AfSIS-GYGA’s available water content maps.
Facebook
TwitterTo provide annual PM2.5 component concentration data for the contiguous U.S. at resolutions of 50m in urban areas and 1km in non-urban areas for public health research to estimate effects on human health, and for other related research.
The Annual Mean PM2.5 Components (EC, NH4, NO3, OC, SO4) 50m Urban and 1km Non-Urban Area Grids for Contiguous U.S., 2000-2019, v1 data set contains annual predictions of the chemical concentrations at a hyper resolution (50m x 50m grid cells) in urban areas and at a high resolution (1km x 1km grid cells) in non-urban areas for the years 2000 to 2019. Particulate matter with an aerodynamic diameter less than 2.5 µm (PM2.5) increases mortality and morbidity. PM2.5 is composed of a mixture of chemical components that vary across space and time. Due to limited hyperlocal data availability, less is known about health risks of PM2.5 components, their U.S.-wide exposure disparities, or which species are driving the biggest intra-urban changes in PM2.5 mass. The national super-learned models were developed across the U.S. for hyperlocal estimation of annual mean elemental carbon, ammonium, nitrate, organic carbon, and sulfate concentrations across 3,535 urban areas at a 50m spatial resolution, and at a 1km resolution for non-urban areas from 2000 to 2019. Using Machine-Learning models (ML), combined with either a Generalized Additive Model (GAM) Ensemble Geographically-Weighted-Averaging (GAM-ENWA) or Super-Learning (SL) and approximately 82 billion predictions across 20 years, hyperlocal super-learned PM2.5 components are now available for further research. The overall R-squared values of 10-fold cross validated models ranged from 0.910 to 0.970 on the training sets for these components, while on the test sets the R-squared values ranged from 0.860 to 0.960. Remarkable spatiotemporal intra-urban and inter-urban variabilities were found in PM2.5 components. The Coordinate Reference System (CRS) for predictions is the World Geodetic System 1984 (WGS84) and the units for the PM2.5 Components are µg/m^3.
The data are provided in RDS tabular format, a file format native to the R programming language, but can also be opened by other languages such as Python.
Facebook
TwitterThis represents Harvard's responses to the Common Data Initiative. The Common Data Set (CDS) initiative is a collaborative effort among data providers in the higher education community and publishers as represented by the College Board, Peterson's, and U.S. News & World Report. The combined goal of this collaboration is to improve the quality and accuracy of information provided to all involved in a student's transition into higher education, as well as to reduce the reporting burden on data providers. This goal is attained by the development of clear, standard data items and definitions in order to determine a specific cohort relevant to each item. Data items and definitions used by the U.S. Department of Education in its higher education surveys often serve as a guide in the continued development of the CDS. Common Data Set items undergo broad review by the CDS Advisory Board as well as by data providers representing secondary schools and two- and four-year colleges. Feedback from those who utilize the CDS also is considered throughout the annual review process.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All fields noted in this table must be collected and validated for inclusion in a release. *URLs provided as exemplars only; within the database, full paths to exact landing pages from which data was retrieved are included.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Explicit detail on the license for every boundary is provided in the metadata.
Facebook
TwitterThis dataset contains data, documentation, and code files associated with studies performed on snapshots of the contents of Harvard Dataverse taken on 28 and 29 October 2019.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains ideal point estimates based on voting behavior in the United Nations General Assembly. This is the first version where the ideal point estimates are based on years rather than UNGA sessions. The reasons for this are that researchers in practice virtually always use years as the basis of analysis and that UNGA sessions increasingly spill over into the following year and are held in special emergency sessions on issues such as the Gaza and Ukraine. There are two types of ideal point estimates: • idealpointfp: ideal point estimates based only on votes on the final passage of resolutions (including failed votes). These are now available from 1946-2024 in IdealPointsJuly2025.tab. These estimates are updated most frequently as the raw data can readily be found online. • Idealpointall: ideal point estimates based on all votes, including votes on paragraphs, motions, and amendments. These votes are based on more data and thus should be more precise. One word of caution is that in some years this means that there are very large numbers of votes on a specific issue, such as the war in Gaza. The correlation between these ideal points is .9846 but there could of course still be some important differences. The data also includes idealpointlegacy, which is based on sessions (all votes). The correlation with idealpointall is .9877. Aside from the 2024 final passage votes, the raw UN voting data are from the UNGA-DM Database: https://unvotes.unige.ch/ Citation: Fjelstul, Joshua, Simon Hug, and Christopher Kilby. "Decision-making in the United Nations General Assembly: A comprehensive database of resolution-related decisions." The Review of International Organizations (2025): 1-18. The ideal point estimates are based on the methodology described in: Citation: Bailey, Michael A., Anton Strezhnev, and Erik Voeten. 2017. Estimating dynamic state preferences from united nations voting data. Journal of Conflict Resolution 61 (2): 430-56.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Political Party Database (PPDB) is an online public database that is a central source for key information about political party organization, party resources, leadership selection, and partisan political participation in many representative democracies. The files contain the data in SPSS, STATA, and CSV formats. The dataset also includes a PDF with the text responses for the appropriate variables. The PPDB Round 2 dataset complements the Round 1a_1b Dataset. Round 2 data covers 51 countries, reflecting the state of 288 parties in the years 2017-2020.
Facebook
TwitterThe Middle East Mass Movements Database, a part of the larger Mass Movements Project, contains basic characteristics of all mass movements in the region for each year that they mobilize at least 1,000 participants in costly action for a least a month in pursuit of a common political goal. The data are the result of a lengthy coding process in which two researchers independently explore each known mobilization with all available secondary sources and, if they determine that it meets the thresholds, separately code its observable characteristics; any coding disagreements are resolved by moderated debate until the researchers reach consensus. The data cover 16 variables on movement characteristics, including mobilizing identities, organization, and action, for the 19 countries of the Middle East and North Africa from 1900-2012.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Harvard tuition data since 1985, for both the undergraduate College and the graduate and professional schools.
This dataset consists of two files: tuition_graduate.csv and undergraduate_package.csv, which contain the tuition and fees data for the graduate schools and undergraduate College, respectively.
tuition_graduate.csv contains the following fields:
undergraduate_package.csv contains the following fields:
All of the data in this dataset comes from The Harvard Open Data Dataverse. Specific citations are as follows:
for the graduate tuition data:
Harvard Financial Aid Office, 2015, "Harvard graduate school tuition", doi:10.7910/DVN/LV0YSQ, Harvard Dataverse, V1
for the undergraduate tuition and fees data:
Harvard Financial Aid, 2015, "Harvard College Tuition", doi:10.7910/DVN/MSS2BE, Harvard Dataverse, V1 [UNF:6:FyXNny+KBTgLX+DzewzEfg==]
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Collaborative Open Legal Data (COLD) - Cases
COLD Cases is a dataset of 8.3 million United States legal decisions with text and metadata, formatted as compressed parquet files. If you'd like to view a sample of the dataset formatted as JSON Lines, you can view one here This dataset exists to support the open legal movement exemplified by projects like Pile of Law and LegalBench. A key input to legal understanding projects is caselaw -- the published, precedential decisions of… See the full description on the dataset page: https://huggingface.co/datasets/harvard-lil/cold-cases.
Facebook
TwitterWe present a sEMG signal database corresponding to the Indian population named “ElectroMyography Analysis of Human Activities - DataBase -2 (EMAHA-DB2).” This data set consists of two different weight training activities which involve isotonic and isometric contractions. Weight training activities are effective for improving muscle strength, overall health, and regaining limb functionality for people undergoing rehabilitation post stroke-related episodes. The EMG signals acquired during weight training can be used for muscle recruitment analysis. For example, during a specific movement, it can determine the set of recruited muscles and their order of recruitment. The institutional ethics committee of Indian Institute of Information Technology Sri City (No. IIITS/EC/2022/01) approved the proposed data collection protocol developed in accordance with the declaration of Helsinki and the “National Ethical Guidelines for Biomedical and Health Research involving human participants" of India. Nine healthy male subjects with no history of upper limb pathology participated in the sEMG data collection process. The average age is 21 years. Before the first session of activities, each of the participants gave written informed consent and the data collection process is completely non-invasive. At the beginning of each session, the participant's hands are cleaned with an alcohol based wet wipe. The total duration of each session is up-to one hour per subject depending on adaptability. Each of the hand muscle activity is recorded with a 2-channel Noraxon Ultium wireless sEMG sensor setup. Two self-adhesive Ag/AgCL dual electrodes were placed at Biceps Brachi(BB) and Flexor carpi ulnaris (FCU) muscle locations. During an activity, the subject is in a standing position and the weight is placed on a table at a convenient height for pickup. Each activity has three phases: rest followed by action and release. Each activity is repeated nine times. In order to avoid muscle fatigue, subjects rest for two minutes between different activities.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by UmbraVenus
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0https://spdx.org/licenses/CC0-1.0
As a part of the broader Harvard Forest Flora project (see data set HF116), we prepared a database of all specimens located in the Harvard Forest herbarium.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Harvard population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Harvard. The dataset can be utilized to understand the population distribution of Harvard by age. For example, using this dataset, we can identify the largest age group in Harvard.
Key observations
The largest age group in Harvard, NE was for the group of age 15-19 years with a population of 106 (9.53%), according to the 2021 American Community Survey. At the same time, the smallest age group in Harvard, NE was the 85+ years with a population of 10 (0.90%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Harvard Population by Age. You can refer the same here
Facebook
TwitterThe Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University:Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaconess Medical Center (BIDMC), and Boston Children's Hospital (BCH).