100+ datasets found

H
PrimeKG
dataverse.harvard.edu
dataone.org
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Payal Chandak (2022). PrimeKG [Dataset]. http://doi.org/10.7910/DVN/IXA7BM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/IXA7BM
Dataset updated
May 2, 2022
Dataset provided by
Harvard Dataverse
Authors
Payal Chandak
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/
H
Harvard Library Bibliographic Metadata
dataverse.harvard.edu
datasetcatalog.nlm.nih.gov
+1more
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Eslao (2023). Harvard Library Bibliographic Metadata [Dataset]. http://doi.org/10.7910/DVN/I8L0ZZ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/I8L0ZZ
Dataset updated
Mar 31, 2023
Dataset provided by
Harvard Dataverse
Authors
Christine Eslao
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Open bibliographic metadata snapshot from 15 February 2022 provided by Harvard Library. We recommend you review the dataset documentation and best practices for using this data collection: Harvard Library Bibliographic Metadata: Detailed Content Inventory.
H
Introduction and Background Information
dataverse.harvard.edu
Updated Feb 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dieter Scholz (2019). Introduction and Background Information [Dataset]. http://doi.org/10.7910/DVN/R33RS9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/R33RS9
Dataset updated
Feb 8, 2019
Dataset provided by
Harvard Dataverse
Authors
Dieter Scholz
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9
Description
Harvard Dataverse => Digital Library - Projects & Theses - Prof. Dr. Scholz ----- Introduction and background information to "Digital Library - Projects & Theses - Prof. Dr. Scholz". The URL of the dataverse: http://dataverse.harvard.edu/dataverse/LibraryProfScholz The URL of this (introduction) dataset: http://doi.org/10.7910/DVN/R33RS9 YOU MAY HAVE BEEN DIRECTED HERE, BECAUSE THE CALLING PAGE HAS NO OTHER ENTRY POINT (with DOI) INTO THIS DATAVERSE. Click on the title of this page to reach the start page of the dataverse! Introduction to the Data in this Dataverse This dataverse is about: Aircraft Design Flight Mechanics Aircraft Systems This dataverse contains research data and software produced by students for their projects and theses on above topics. Get linked to all other resources from their reports using the URN from the German National Library (DNB) as given in each dataset under "Metadata": https://nbn-resolving.org/html/urn:nbn:de:gbv:18302-aeroJJJJ-MM-DD.01x Alternative sites that store the data given in this dataverse are: http://library.ProfScholz.de and https://archive.org/details/@profscholz Open an "item". Under "DOWNLOAD OPTIONS" select the file (as far as available) called "ZIP" to download DataXxxx.zip. Alternatively, go to "SHOW ALL"; In the new window select next to DataXxxx.zip click "View Contents" or select URL next to "Data-list". Download single file from DataXxxx.zip. Data Publishing Data publishing means publishing of research data for (re)use by others. It consists of preparing single files or a dataset containing several files for access in the WWW. This practice is part of the open science movement. There is consensus about the benefits resulting from Open Data - especially in connection with Open Access publishing. It is important to link the publication (e.g. thesis) with the underlying data and vice versa. General (not disciplinary) and free data repositories are: Harvard Dataverse (this one!) figshare (emphasis: multi media) Zenodo (emphasis: results from EU research, mainly text) Mendeley Data (emphasis: data associated with journal articles) To find data repositories use http://re3data.org Read more on https://en.wikipedia.org/wiki/Data_publishing
H
ArchaeoGLOBE Regions
dataverse.harvard.edu
search.dataone.org
Updated Feb 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ArchaeoGLOBE Project (2019). ArchaeoGLOBE Regions [Dataset]. http://doi.org/10.7910/DVN/CQWUBI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CQWUBI
Dataset updated
Feb 6, 2019
Dataset provided by
Harvard Dataverse
Authors
ArchaeoGLOBE Project
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains documentation on the 146 global regions used to organize responses to the ArchaeGLOBE land use questionnaire between May 18 and July 31, 2018. The regions were formed from modern administrative regions (Natural Earth 1:50m Admin1 - states and provinces, https://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-admin-1-states-provinces/). The boundaries of the polygons represent rough geographic areas that serve as analytical units useful in two respects - for the history of land use over the past 10,000 years (a moving target) and for the history of archaeological research. Some consideration was also given to creating regions that were relatively equal in size. The regionalization process went through several rounds of feedback and redrawing before arriving at the 146 regions used in the survey. No bounded regional system could ever truly reflect the complex spatial distribution of archaeological knowledge on past human land use, but operating at a regional scale was necessary to facilitate timely collaboration while achieving global coverage. Map in Google Earth Format: ArchaeGLOBE_Regions_kml.kmz Map in ArcGIS Shapefile Format: ArchaeGLOBE_Regions.zip (multiple files in zip file) The shapefile format is a digital vector file that stores geographic location and associated attribute information. It is actually a collection of several different file types: .shp — shape format: the feature geometry .shx — shape index format: a positional index of the feature geometry .dbf — attribute format: columnar attributes for each shape .prj — projection format: the coordinate system and projection information .sbn and .sbx — a spatial index of the features .shp.xml — geospatial metadata in XML format .cpg — specifies the code page for identifying character encoding Attributes: FID - a unique identifier for every object in a shapefile table (0-145) Shape - the type of object (polygon) World_ID - coded value assigned to each feature according to its division into one of seventeen ‘World Regions’ based on the geographic regions used by the Statistics Division of the United Nations (https://unstats.un.org/unsd/methodology/m49/), with small changes to better reflect archaeological scholarly communities. These large regions provide organizational structure, but are not analytical units for the study. World_RG - text description of each ‘World Region’ Archaeo_ID - unique identifier (1-146) corresponding to the region code used in the ArchaeoGLOBE land use questionnaire and all ArchaeoGLOBE datasets Archaeo_RG - text description of each region Total_Area - the total area, in square kilometers, of each region Land-Area - the total area minus the area of all lakes and reservoirs found within each region (source: https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-lakes/) PDF of Region Attribute Table: ArchaeoGLOBE Regions Attributes.pdf Excel file of Region Attribute Table: ArchaeoGLOBE Regions Attributes.xls Printed Maps in PDF Format: ArchaeoGLOBE Regions.pdf Documentation of the ArchaeoGLOBE Regional Map: ArchaeoGLOBE Regions README.doc
H
Child Care Bureau
dataverse.harvard.edu
datasetcatalog.nlm.nih.gov
+1more
Updated Nov 30, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2010). Child Care Bureau [Dataset]. http://doi.org/10.7910/DVN/3YOBMN
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/3YOBMN
Dataset updated
Nov 30, 2010
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Users can get data on child care programs and child care expenditures. Background The Child Care Bureau is housed under the Office of Family Assistance portion of the Administration of Children and Families. The Child Care Bureau’s purpose is to promote access to affordable, high quality child care and after-school programs. Through the administration of the Child Care and Development Fund, the Child Care Bureau provides financial assistance to low-income families and oversees the implementation of state child care policies and programs. User Functionality The website provides a variety of information regarding the administration, laws and regulations of the Child Care and Development Fund. All the information is available for download in Word or PDF formats. Users can also view data tables regarding child care program statistics and Care and Development Expenditures. Child care program statistics includes information about number of children and families served, and percentages by age group, race/ ethnicity, payment method or type and place of care. Information is organized by state. All data tables can be downloaded as Excel files of PDF files. Data Notes Data tables are available for each year since 1998. The most recent data available is from 2008.
d
Dataset metadata of known Dataverse installations
search.dataone.org
dataverse.harvard.edu
+1more
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/DCDKZQ
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Gautier, Julian
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.
d
Analysis Practice Data
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshad, Abdul Rehman (2023). Analysis Practice Data [Dataset]. http://doi.org/10.7910/DVN/R1VIPU
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/R1VIPU
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Arshad, Abdul Rehman
Description
This data set comes as a supplementary resource for my book on Biostatistics and SPSS. Readers are free to download this file and practice using SPSS as they go along reading the book.
H
Data from: Teaching Entrepreneurship: Impact of Business Training on...
dataverse.harvard.edu
search.dataone.org
Updated Nov 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dean Karlan; Martin Valdivia (2019). Teaching Entrepreneurship: Impact of Business Training on Microfinance Clients and Institutions [Dataset]. http://doi.org/10.7910/DVN/27985
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/27985
Dataset updated
Nov 13, 2019
Dataset provided by
Harvard Dataverse
Authors
Dean Karlan; Martin Valdivia
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Lima, Ayacucho, Peru
Description
Using a randomized control trial, we measure the marginal impact of adding business training to a Peruvian group lending program for female microentrepreneurs. Treatment groups received thirty- to sixty-minute entrepreneurship training sessions during their normal weekly or monthly banking meeting over a period of one to two years. Control groups remained as they were before, meeting at the same frequency but solely for making loan and savings payments. We find little or no evidence of changes in key outcomes such as business revenue, profits, or employment. We nevertheless observed business knowledge improvements and increased client retention rates for the microfinance institution.
H
AfroGrid V1.0
dataverse.harvard.edu
search.dataone.org
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ore Koren; Justin Schon (2023). AfroGrid V1.0 [Dataset]. http://doi.org/10.7910/DVN/LDI5TK
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LDI5TK
Dataset updated
Nov 28, 2023
Dataset provided by
Harvard Dataverse
Authors
Ore Koren; Justin Schon
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Studies on the impact of environmental stressors on conflict have proliferated in recent years, but a consensus is slow to emerge, at least partly due to empirical limitations. In this study, we present Afro-Grid: an integrated, disaggregated 0.5 degree grid-month data on conflict, environmental stress, and socioeconomic features in Africa, intended to propel research on these issues forward. Afro-Grid offers several important extensions for researchers and policymakers, including: (i) standardizing (using established methods) data sources on conflict, environmental stress, and socio economic factors across spatial and temporal scales; (ii) combining these data into a single, openly-available file, maximizing the accessibility of these data or researchers and policymakers regardless of their software background; and (iii) including NDVI and dual-series harmonized night lights series that have traditionally not been accessible to researchers without advanced computational expertise. Using a series of comparative regressions at the grid-month and grid-year levels, combined with reporting descriptive statistics and visualizations, we illustrate that this temporally and geographically disaggregated dataset provides valuable extensions for research related to the climate-conflict nexus and the role of socioeconomic features in shaping conflict trends, as well as for research and policy on development, politics, and economics broadly.
H
MicroMap - CellDesigner xml and supporting files
dataverse.harvard.edu
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ines Thiele (2025). MicroMap - CellDesigner xml and supporting files [Dataset]. http://doi.org/10.7910/DVN/FZKMJ8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FZKMJ8
Dataset updated
Aug 22, 2025
Dataset provided by
Harvard Dataverse
Authors
Ines Thiele
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The CellDesigner xml format allows for MicroMap inspection using the CellDesigner software available at https://www.celldesigner.org, as well as computational modelling and visualisation using the COBRA Toolbox https://opencobra.github.io The pdf format allows for map inspection. We suggest to download the file and 'open with' a web browser of your choice for relatively fast and responsive exploration.
H
ACCESS DB Version (Aug 29 2017)
dataverse.harvard.edu
Updated Nov 8, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Fuller (2017). ACCESS DB Version (Aug 29 2017) [Dataset]. http://doi.org/10.7910/DVN/SHMDGU
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SHMDGU
Dataset updated
Nov 8, 2017
Dataset provided by
Harvard Dataverse
Authors
Michael Fuller
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 417,000 men and women. Users must download two compressed files: a BASE file and a USER file. To use this database do the following: Please download 20170829CBDBavBase.7z (the BASE file) and 20170829CBDBavUser.7z (the USER file) uncompress them to the same folder. Uncompressed there will be get four files CBDB_InstallationGuide.pdf, HelpFiles folder, 20170829CBDBavUser.mdb, 20170829CBDBavBase.mdb. The CBDB_InstallationGuide.pdf gives instructions on installation: see Part 2. "Installing the Database".
H
Social B(eye)as Dataset
dataverse.harvard.edu
Updated Jan 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher (2019). Social B(eye)as Dataset [Dataset]. http://doi.org/10.7910/DVN/APZKSS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/APZKSS
Dataset updated
Jan 16, 2019
Dataset provided by
Harvard Dataverse
Authors
Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
EU Horizon 2020 Research and Innovation Programme
Description
Image analysis algorithms have become an indispensable tool in our information ecosystem, facilitating new forms of visual communication and information sharing. At the same time, they enable large-scale socio-technical research which would otherwise be difficult to carry out. However, their outputs may exhibit social bias, especially when analyzing people images. Since most algorithms are proprietary and opaque, we pro-pose a method of auditing their outputs for social biases. To be able to compare how algorithms interpret a controlled set of people images, we collected descriptions across six image tagging APIs. In order to com-pare these results to human behavior, we also collected descriptions on the same images from crowdworkers in two anglophone regions. While the APIs do not output explicitly offensive descriptions, as humans do, future work should consider if and how they reinforce social inequalities in implicit ways. Beyond computer vision auditing, the dataset of human- and machine-produced tags, and the typology of tags, can be used to explore a range of research questions related to both algorithmic and human behaviors.
H
Replication Data for: "Bilateral or Multilateral? International Financial...
dataverse.harvard.edu
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Lang; Axel Dreher; B. Peter Rosendorff; James Raymond Vreeland (2023). Replication Data for: "Bilateral or Multilateral? International Financial Flows and the Dirty-Work Hypothesis" [Dataset]. http://doi.org/10.7910/DVN/CGXPF5
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CGXPF5
Dataset updated
Jan 17, 2023
Dataset provided by
Harvard Dataverse
Authors
Valentin Lang; Axel Dreher; B. Peter Rosendorff; James Raymond Vreeland
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Replication Data for: "Bilateral or Multilateral? International Financial Flows and the Dirty-Work Hypothesis"
H
school_data
dataverse.harvard.edu
Updated Jan 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeehee Han (2016). school_data [Dataset]. http://doi.org/10.7910/DVN/MTHO5E
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/MTHO5E
Dataset updated
Jan 11, 2016
Dataset provided by
Harvard Dataverse
Authors
Jeehee Han
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This folder includes a Stata do-file (that merges and cleans all the excel files in the folder) and files of administrative data downloaded from KESS (http://kess.kedi.re.kr/index). Files on administrative data were sorted by years, school types, and the types of variables. *Note: please download this entire folder and save the folder as "school_data" for easier replication. The name will be used in another Stata do-file when merging school data with other Stata data files.
H
Process_patents
dataverse.harvard.edu
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Florian Seliger; Sebastian Heinrich; Nicolas Banholzer (2023). Process_patents [Dataset]. http://doi.org/10.7910/DVN/CBSK2W
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CBSK2W
Dataset updated
Jun 15, 2023
Dataset provided by
Harvard Dataverse
Authors
Florian Seliger; Sebastian Heinrich; Nicolas Banholzer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains patent filings at the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO) and their corresponding "process shares" which are calculated with different methods. We provide two files: One with patent filings at the EPO (process_patents_epo.txt), the other with patent filings at the USPTO (process_patents_uspto.zip). The USPTO file is in .zip format due to its large size. The process share indicates to which degree a patent is a process patent rather than a product patent. The shares have been calculated based on the classification of patent claims as being process claims or not. Patent abstracts have been classified in the same way. The PDF file Codebook provides an overview on all columns in the data. A detailed data description can be found in the study on the EPO's homepage (title of the study: "Knowledge spillovers from product and process inventions and their impact on firm performance"): https://www.epo.org/learning-events/materials/academic-research-programme/research-project-grants.html Please make sure to cite this study if you use the data in your work. Funding by the "European Office Academic Research Programme" is gratefully ackknowledged.
H
Replication Data for: 'Inflammatory Political Campaigns and Racial Bias in...
dataverse.harvard.edu
Updated Dec 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pauline Grosjean; Federico Masera; Hasin Yousaf (2022). Replication Data for: 'Inflammatory Political Campaigns and Racial Bias in Policing' [Dataset]. http://doi.org/10.7910/DVN/A3B9HE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/A3B9HE
Dataset updated
Dec 16, 2022
Dataset provided by
Harvard Dataverse
Authors
Pauline Grosjean; Federico Masera; Hasin Yousaf
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The programs replicate tables and figures from "Inflammatory Political Campaigns and Racial Bias in Policing", by Grosjean, Masera, and Yousaf. Please see the Readme file for additional details. The data files are too large to host on Dataverse but are available for download here: https://hu.sharepoint.com/:f:/s/HarvardEconomicsDatasets/Eg3OHui76VxIqrlsdE_mjGkBOxsJgCbr0FBogKAHighNeA?e=CfzIgc
H
Worldwide Fulltext Usage of Data Astrophysics Data System in 2011
dataverse.harvard.edu
data.niaid.nih.gov
Updated Oct 22, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SAO/NASA Astrophysics Data System (2013). Worldwide Fulltext Usage of Data Astrophysics Data System in 2011 [Dataset]. http://doi.org/10.7910/DVN/22951
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/22951
Dataset updated
Oct 22, 2013
Dataset provided by
Harvard Dataverse
Authors
SAO/NASA Astrophysics Data System
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951
Dataset funded by
NASAhttp://nasa.gov/
Description
The data contained in these files (one in Excel, the other in JSON format) consists of full text download numbers through the ADS during the year 2011. Every row is a journal, indicated by the journal name and the ADS abbreviation ("bibstem", see: http://adsabs.harvard.edu/abs_doc/journals2.html). For each journal, we present the download numbers split up by publication year (with the first data column being the range "pre 1998"). Full text downloads within the ADS service are defined as 'clicks' on either of the links within an ADS record that provide access to full text in one form or other. Specifically, these are the 'E', 'F', 'L', 'G' or 'X' links (see http://doc.adsabs.harvard.edu/abs_doc/help_pages/results.html#List_of_Links definitions). The data contained in these files had been released under the CC-BY License (see: http://creativecommons.org/licenses/by/3.0/us/). Please acknowledge the ADS in a publication that makes us of these data by the phrase: ``This research has made use of NASA's Astrophysics Data System."
H
ACCESS DB Version (April 24 2019)
dataverse.harvard.edu
Updated May 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Fuller (2021). ACCESS DB Version (April 24 2019) [Dataset]. http://doi.org/10.7910/DVN/2UFYFG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2UFYFG
Dataset updated
May 26, 2021
Dataset provided by
Harvard Dataverse
Authors
Michael Fuller
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFGhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFG
Description
Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 420,000 men and women in MS ACCESS Format. Documentation is included. Project Website (2019-04-24)
H
Replication Data for: Measurement Issues in Conflict Event Data: Addressing...
dataverse.harvard.edu
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mert Can Yilmaz; Magnus Öberg (2025). Replication Data for: Measurement Issues in Conflict Event Data: Addressing some misconceptions about what drives differences between human-coded event datasets [Dataset]. http://doi.org/10.7910/DVN/WMZW4C
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/WMZW4C
Dataset updated
Jul 4, 2025
Dataset provided by
Harvard Dataverse
Authors
Mert Can Yilmaz; Magnus Öberg
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This replication package accompanies the paper "Measurement Issues in Conflict Event Data: Addressing some misconceptions about what drives differences between human-coded event datasets." It contains all data necessary to reproduce the analyses presented in the study. Please note that due to data restrictions, we are unable to openly share the raw ACLED data. However, the data is available to registered users and can be downloaded directly from the Armed Conflict Location & Event Data Project (ACLED) website: https://acleddata.com. The dataset we used was exported on February 28, 2025. While you can download the data from ACLED, it may have been modified since then and thus may not correspond exactly to the dataset referenced here. As ACLED does not provide a versioning system, reproducing the exact same analyses may not be possible without consulting the copy attached here.
H
Extracted Data from: Health Center Service Delivery and Look-Alike Sites
dataverse.harvard.edu
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Resources & Services Administration (2025). Extracted Data from: Health Center Service Delivery and Look-Alike Sites [Dataset]. http://doi.org/10.7910/DVN/RT7CIO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RT7CIO
Dataset updated
May 22, 2025
Dataset provided by
Harvard Dataverse
Authors
Health Resources & Services Administration
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Time period covered
May 21, 2025
Area covered
United States
Description
This submission includes publicly available data extracted in its original form. If you have questions about the underlying data stored here, please contact HRSA Contact Center (Phone: 877-464-4772 (TTY: 877-897-9910)). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu.” "This dataset provides a list of federally-funded health centers that provide health services. For more than 40 years, Health Resources and Services Administration (HRSA)-supported health centers have provided comprehensive, culturally competent, quality primary health care services to medically underserved communities and vulnerable populations. Health centers are community-based and consumer-run organizations that serve populations with limited access to health care. These include low-income populations, the uninsured, those with limited English proficiency, migratory and seasonal agricultural workers, individuals and families experiencing homelessness, and those living in public housing." [Quote from: https://data.hrsa.gov/data/download?data=HSCD#HSCD]

Facebook

Twitter

Click to copy link

Link copied

Cite

Payal Chandak (2022). PrimeKG [Dataset]. http://doi.org/10.7910/DVN/IXA7BM

PrimeKG

Explore at:

229 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.7910/DVN/IXA7BM

Dataset updated

May 2, 2022

Dataset provided by

Harvard Dataverse

Authors

Payal Chandak

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/

Clear search

Close search

Google apps

Main menu

PrimeKG

Harvard Library Bibliographic Metadata

Introduction and Background Information

ArchaeoGLOBE Regions

Child Care Bureau

Dataset metadata of known Dataverse installations

Analysis Practice Data

Data from: Teaching Entrepreneurship: Impact of Business Training on...

AfroGrid V1.0

MicroMap - CellDesigner xml and supporting files

ACCESS DB Version (Aug 29 2017)

Social B(eye)as Dataset

Replication Data for: "Bilateral or Multilateral? International Financial...

school_data

Process_patents

Replication Data for: 'Inflammatory Political Campaigns and Racial Bias in...

Worldwide Fulltext Usage of Data Astrophysics Data System in 2011

ACCESS DB Version (April 24 2019)

Replication Data for: Measurement Issues in Conflict Event Data: Addressing...

Extracted Data from: Health Center Service Delivery and Look-Alike Sites

PrimeKG