CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Open bibliographic metadata snapshot from 15 February 2022 provided by Harvard Library. We recommend you review the dataset documentation and best practices for using this data collection: Harvard Library Bibliographic Metadata: Detailed Content Inventory.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9
Harvard Dataverse => Digital Library - Projects & Theses - Prof. Dr. Scholz ----- Introduction and background information to "Digital Library - Projects & Theses - Prof. Dr. Scholz". The URL of the dataverse: http://dataverse.harvard.edu/dataverse/LibraryProfScholz The URL of this (introduction) dataset: http://doi.org/10.7910/DVN/R33RS9 YOU MAY HAVE BEEN DIRECTED HERE, BECAUSE THE CALLING PAGE HAS NO OTHER ENTRY POINT (with DOI) INTO THIS DATAVERSE. Click on the title of this page to reach the start page of the dataverse! Introduction to the Data in this Dataverse This dataverse is about: Aircraft Design Flight Mechanics Aircraft Systems This dataverse contains research data and software produced by students for their projects and theses on above topics. Get linked to all other resources from their reports using the URN from the German National Library (DNB) as given in each dataset under "Metadata": https://nbn-resolving.org/html/urn:nbn:de:gbv:18302-aeroJJJJ-MM-DD.01x Alternative sites that store the data given in this dataverse are: http://library.ProfScholz.de and https://archive.org/details/@profscholz Open an "item". Under "DOWNLOAD OPTIONS" select the file (as far as available) called "ZIP" to download DataXxxx.zip. Alternatively, go to "SHOW ALL"; In the new window select next to DataXxxx.zip click "View Contents" or select URL next to "Data-list". Download single file from DataXxxx.zip. Data Publishing Data publishing means publishing of research data for (re)use by others. It consists of preparing single files or a dataset containing several files for access in the WWW. This practice is part of the open science movement. There is consensus about the benefits resulting from Open Data - especially in connection with Open Access publishing. It is important to link the publication (e.g. thesis) with the underlying data and vice versa. General (not disciplinary) and free data repositories are: Harvard Dataverse (this one!) figshare (emphasis: multi media) Zenodo (emphasis: results from EU research, mainly text) Mendeley Data (emphasis: data associated with journal articles) To find data repositories use http://re3data.org Read more on https://en.wikipedia.org/wiki/Data_publishing
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains documentation on the 146 global regions used to organize responses to the ArchaeGLOBE land use questionnaire between May 18 and July 31, 2018. The regions were formed from modern administrative regions (Natural Earth 1:50m Admin1 - states and provinces, https://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-admin-1-states-provinces/). The boundaries of the polygons represent rough geographic areas that serve as analytical units useful in two respects - for the history of land use over the past 10,000 years (a moving target) and for the history of archaeological research. Some consideration was also given to creating regions that were relatively equal in size. The regionalization process went through several rounds of feedback and redrawing before arriving at the 146 regions used in the survey. No bounded regional system could ever truly reflect the complex spatial distribution of archaeological knowledge on past human land use, but operating at a regional scale was necessary to facilitate timely collaboration while achieving global coverage. Map in Google Earth Format: ArchaeGLOBE_Regions_kml.kmz Map in ArcGIS Shapefile Format: ArchaeGLOBE_Regions.zip (multiple files in zip file) The shapefile format is a digital vector file that stores geographic location and associated attribute information. It is actually a collection of several different file types: .shp — shape format: the feature geometry .shx — shape index format: a positional index of the feature geometry .dbf — attribute format: columnar attributes for each shape .prj — projection format: the coordinate system and projection information .sbn and .sbx — a spatial index of the features .shp.xml — geospatial metadata in XML format .cpg — specifies the code page for identifying character encoding Attributes: FID - a unique identifier for every object in a shapefile table (0-145) Shape - the type of object (polygon) World_ID - coded value assigned to each feature according to its division into one of seventeen ‘World Regions’ based on the geographic regions used by the Statistics Division of the United Nations (https://unstats.un.org/unsd/methodology/m49/), with small changes to better reflect archaeological scholarly communities. These large regions provide organizational structure, but are not analytical units for the study. World_RG - text description of each ‘World Region’ Archaeo_ID - unique identifier (1-146) corresponding to the region code used in the ArchaeoGLOBE land use questionnaire and all ArchaeoGLOBE datasets Archaeo_RG - text description of each region Total_Area - the total area, in square kilometers, of each region Land-Area - the total area minus the area of all lakes and reservoirs found within each region (source: https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-lakes/) PDF of Region Attribute Table: ArchaeoGLOBE Regions Attributes.pdf Excel file of Region Attribute Table: ArchaeoGLOBE Regions Attributes.xls Printed Maps in PDF Format: ArchaeoGLOBE Regions.pdf Documentation of the ArchaeoGLOBE Regional Map: ArchaeoGLOBE Regions README.doc
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Users can get data on child care programs and child care expenditures. Background The Child Care Bureau is housed under the Office of Family Assistance portion of the Administration of Children and Families. The Child Care Bureau’s purpose is to promote access to affordable, high quality child care and after-school programs. Through the administration of the Child Care and Development Fund, the Child Care Bureau provides financial assistance to low-income families and oversees the implementation of state child care policies and programs. User Functionality The website provides a variety of information regarding the administration, laws and regulations of the Child Care and Development Fund. All the information is available for download in Word or PDF formats. Users can also view data tables regarding child care program statistics and Care and Development Expenditures. Child care program statistics includes information about number of children and families served, and percentages by age group, race/ ethnicity, payment method or type and place of care. Information is organized by state. All data tables can be downloaded as Excel files of PDF files. Data Notes Data tables are available for each year since 1998. The most recent data available is from 2008.
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.
This data set comes as a supplementary resource for my book on Biostatistics and SPSS. Readers are free to download this file and practice using SPSS as they go along reading the book.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Using a randomized control trial, we measure the marginal impact of adding business training to a Peruvian group lending program for female microentrepreneurs. Treatment groups received thirty- to sixty-minute entrepreneurship training sessions during their normal weekly or monthly banking meeting over a period of one to two years. Control groups remained as they were before, meeting at the same frequency but solely for making loan and savings payments. We find little or no evidence of changes in key outcomes such as business revenue, profits, or employment. We nevertheless observed business knowledge improvements and increased client retention rates for the microfinance institution.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Studies on the impact of environmental stressors on conflict have proliferated in recent years, but a consensus is slow to emerge, at least partly due to empirical limitations. In this study, we present Afro-Grid: an integrated, disaggregated 0.5 degree grid-month data on conflict, environmental stress, and socioeconomic features in Africa, intended to propel research on these issues forward. Afro-Grid offers several important extensions for researchers and policymakers, including: (i) standardizing (using established methods) data sources on conflict, environmental stress, and socio economic factors across spatial and temporal scales; (ii) combining these data into a single, openly-available file, maximizing the accessibility of these data or researchers and policymakers regardless of their software background; and (iii) including NDVI and dual-series harmonized night lights series that have traditionally not been accessible to researchers without advanced computational expertise. Using a series of comparative regressions at the grid-month and grid-year levels, combined with reporting descriptive statistics and visualizations, we illustrate that this temporally and geographically disaggregated dataset provides valuable extensions for research related to the climate-conflict nexus and the role of socioeconomic features in shaping conflict trends, as well as for research and policy on development, politics, and economics broadly.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The CellDesigner xml format allows for MicroMap inspection using the CellDesigner software available at https://www.celldesigner.org, as well as computational modelling and visualisation using the COBRA Toolbox https://opencobra.github.io The pdf format allows for map inspection. We suggest to download the file and 'open with' a web browser of your choice for relatively fast and responsive exploration.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 417,000 men and women. Users must download two compressed files: a BASE file and a USER file. To use this database do the following: Please download 20170829CBDBavBase.7z (the BASE file) and 20170829CBDBavUser.7z (the USER file) uncompress them to the same folder. Uncompressed there will be get four files CBDB_InstallationGuide.pdf, HelpFiles folder, 20170829CBDBavUser.mdb, 20170829CBDBavBase.mdb. The CBDB_InstallationGuide.pdf gives instructions on installation: see Part 2. "Installing the Database".
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Image analysis algorithms have become an indispensable tool in our information ecosystem, facilitating new forms of visual communication and information sharing. At the same time, they enable large-scale socio-technical research which would otherwise be difficult to carry out. However, their outputs may exhibit social bias, especially when analyzing people images. Since most algorithms are proprietary and opaque, we pro-pose a method of auditing their outputs for social biases. To be able to compare how algorithms interpret a controlled set of people images, we collected descriptions across six image tagging APIs. In order to com-pare these results to human behavior, we also collected descriptions on the same images from crowdworkers in two anglophone regions. While the APIs do not output explicitly offensive descriptions, as humans do, future work should consider if and how they reinforce social inequalities in implicit ways. Beyond computer vision auditing, the dataset of human- and machine-produced tags, and the typology of tags, can be used to explore a range of research questions related to both algorithmic and human behaviors.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication Data for: "Bilateral or Multilateral? International Financial Flows and the Dirty-Work Hypothesis"
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This folder includes a Stata do-file (that merges and cleans all the excel files in the folder) and files of administrative data downloaded from KESS (http://kess.kedi.re.kr/index). Files on administrative data were sorted by years, school types, and the types of variables. *Note: please download this entire folder and save the folder as "school_data" for easier replication. The name will be used in another Stata do-file when merging school data with other Stata data files.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains patent filings at the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO) and their corresponding "process shares" which are calculated with different methods. We provide two files: One with patent filings at the EPO (process_patents_epo.txt), the other with patent filings at the USPTO (process_patents_uspto.zip). The USPTO file is in .zip format due to its large size. The process share indicates to which degree a patent is a process patent rather than a product patent. The shares have been calculated based on the classification of patent claims as being process claims or not. Patent abstracts have been classified in the same way. The PDF file Codebook provides an overview on all columns in the data. A detailed data description can be found in the study on the EPO's homepage (title of the study: "Knowledge spillovers from product and process inventions and their impact on firm performance"): https://www.epo.org/learning-events/materials/academic-research-programme/research-project-grants.html Please make sure to cite this study if you use the data in your work. Funding by the "European Office Academic Research Programme" is gratefully ackknowledged.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The programs replicate tables and figures from "Inflammatory Political Campaigns and Racial Bias in Policing", by Grosjean, Masera, and Yousaf. Please see the Readme file for additional details. The data files are too large to host on Dataverse but are available for download here: https://hu.sharepoint.com/:f:/s/HarvardEconomicsDatasets/Eg3OHui76VxIqrlsdE_mjGkBOxsJgCbr0FBogKAHighNeA?e=CfzIgc
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951
The data contained in these files (one in Excel, the other in JSON format) consists of full text download numbers through the ADS during the year 2011. Every row is a journal, indicated by the journal name and the ADS abbreviation ("bibstem", see: http://adsabs.harvard.edu/abs_doc/journals2.html). For each journal, we present the download numbers split up by publication year (with the first data column being the range "pre 1998"). Full text downloads within the ADS service are defined as 'clicks' on either of the links within an ADS record that provide access to full text in one form or other. Specifically, these are the 'E', 'F', 'L', 'G' or 'X' links (see http://doc.adsabs.harvard.edu/abs_doc/help_pages/results.html#List_of_Links definitions). The data contained in these files had been released under the CC-BY License (see: http://creativecommons.org/licenses/by/3.0/us/). Please acknowledge the ADS in a publication that makes us of these data by the phrase: ``This research has made use of NASA's Astrophysics Data System."
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFGhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFG
Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 420,000 men and women in MS ACCESS Format. Documentation is included. Project Website (2019-04-24)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This replication package accompanies the paper "Measurement Issues in Conflict Event Data: Addressing some misconceptions about what drives differences between human-coded event datasets." It contains all data necessary to reproduce the analyses presented in the study. Please note that due to data restrictions, we are unable to openly share the raw ACLED data. However, the data is available to registered users and can be downloaded directly from the Armed Conflict Location & Event Data Project (ACLED) website: https://acleddata.com. The dataset we used was exported on February 28, 2025. While you can download the data from ACLED, it may have been modified since then and thus may not correspond exactly to the dataset referenced here. As ACLED does not provide a versioning system, reproducing the exact same analyses may not be possible without consulting the copy attached here.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This submission includes publicly available data extracted in its original form. If you have questions about the underlying data stored here, please contact HRSA Contact Center (Phone: 877-464-4772 (TTY: 877-897-9910)). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu.” "This dataset provides a list of federally-funded health centers that provide health services. For more than 40 years, Health Resources and Services Administration (HRSA)-supported health centers have provided comprehensive, culturally competent, quality primary health care services to medically underserved communities and vulnerable populations. Health centers are community-based and consumer-run organizations that serve populations with limited access to health care. These include low-income populations, the uninsured, those with limited English proficiency, migratory and seasonal agricultural workers, individuals and families experiencing homelessness, and those living in public housing." [Quote from: https://data.hrsa.gov/data/download?data=HSCD#HSCD]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/