100+ datasets found
  1. H

    PrimeKG

    • dataverse.harvard.edu
    • dataone.org
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Payal Chandak (2022). PrimeKG [Dataset]. http://doi.org/10.7910/DVN/IXA7BM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Payal Chandak
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/

  2. H

    Harvard Library Bibliographic Metadata

    • dataverse.harvard.edu
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christine Eslao (2023). Harvard Library Bibliographic Metadata [Dataset]. http://doi.org/10.7910/DVN/I8L0ZZ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Christine Eslao
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Open bibliographic metadata snapshot from 15 February 2022 provided by Harvard Library. We recommend you review the dataset documentation and best practices for using this data collection: Harvard Library Bibliographic Metadata: Detailed Content Inventory.

  3. H

    Introduction and Background Information

    • dataverse.harvard.edu
    Updated Feb 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dieter Scholz (2019). Introduction and Background Information [Dataset]. http://doi.org/10.7910/DVN/R33RS9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 8, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Dieter Scholz
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.3/customlicense?persistentId=doi:10.7910/DVN/R33RS9

    Description

    Harvard Dataverse => Digital Library - Projects & Theses - Prof. Dr. Scholz ----- Introduction and background information to "Digital Library - Projects & Theses - Prof. Dr. Scholz". The URL of the dataverse: http://dataverse.harvard.edu/dataverse/LibraryProfScholz The URL of this (introduction) dataset: http://doi.org/10.7910/DVN/R33RS9 YOU MAY HAVE BEEN DIRECTED HERE, BECAUSE THE CALLING PAGE HAS NO OTHER ENTRY POINT (with DOI) INTO THIS DATAVERSE. Click on the title of this page to reach the start page of the dataverse! Introduction to the Data in this Dataverse This dataverse is about: Aircraft Design Flight Mechanics Aircraft Systems This dataverse contains research data and software produced by students for their projects and theses on above topics. Get linked to all other resources from their reports using the URN from the German National Library (DNB) as given in each dataset under "Metadata": https://nbn-resolving.org/html/urn:nbn:de:gbv:18302-aeroJJJJ-MM-DD.01x Alternative sites that store the data given in this dataverse are: http://library.ProfScholz.de and https://archive.org/details/@profscholz Open an "item". Under "DOWNLOAD OPTIONS" select the file (as far as available) called "ZIP" to download DataXxxx.zip. Alternatively, go to "SHOW ALL"; In the new window select next to DataXxxx.zip click "View Contents" or select URL next to "Data-list". Download single file from DataXxxx.zip. Data Publishing Data publishing means publishing of research data for (re)use by others. It consists of preparing single files or a dataset containing several files for access in the WWW. This practice is part of the open science movement. There is consensus about the benefits resulting from Open Data - especially in connection with Open Access publishing. It is important to link the publication (e.g. thesis) with the underlying data and vice versa. General (not disciplinary) and free data repositories are: Harvard Dataverse (this one!) figshare (emphasis: multi media) Zenodo (emphasis: results from EU research, mainly text) Mendeley Data (emphasis: data associated with journal articles) To find data repositories use http://re3data.org Read more on https://en.wikipedia.org/wiki/Data_publishing

  4. H

    ArchaeoGLOBE Regions

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArchaeoGLOBE Project (2019). ArchaeoGLOBE Regions [Dataset]. http://doi.org/10.7910/DVN/CQWUBI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    ArchaeoGLOBE Project
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains documentation on the 146 global regions used to organize responses to the ArchaeGLOBE land use questionnaire between May 18 and July 31, 2018. The regions were formed from modern administrative regions (Natural Earth 1:50m Admin1 - states and provinces, https://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-admin-1-states-provinces/). The boundaries of the polygons represent rough geographic areas that serve as analytical units useful in two respects - for the history of land use over the past 10,000 years (a moving target) and for the history of archaeological research. Some consideration was also given to creating regions that were relatively equal in size. The regionalization process went through several rounds of feedback and redrawing before arriving at the 146 regions used in the survey. No bounded regional system could ever truly reflect the complex spatial distribution of archaeological knowledge on past human land use, but operating at a regional scale was necessary to facilitate timely collaboration while achieving global coverage. Map in Google Earth Format: ArchaeGLOBE_Regions_kml.kmz Map in ArcGIS Shapefile Format: ArchaeGLOBE_Regions.zip (multiple files in zip file) The shapefile format is a digital vector file that stores geographic location and associated attribute information. It is actually a collection of several different file types: .shp — shape format: the feature geometry .shx — shape index format: a positional index of the feature geometry .dbf — attribute format: columnar attributes for each shape .prj — projection format: the coordinate system and projection information .sbn and .sbx — a spatial index of the features .shp.xml — geospatial metadata in XML format .cpg — specifies the code page for identifying character encoding Attributes: FID - a unique identifier for every object in a shapefile table (0-145) Shape - the type of object (polygon) World_ID - coded value assigned to each feature according to its division into one of seventeen ‘World Regions’ based on the geographic regions used by the Statistics Division of the United Nations (https://unstats.un.org/unsd/methodology/m49/), with small changes to better reflect archaeological scholarly communities. These large regions provide organizational structure, but are not analytical units for the study. World_RG - text description of each ‘World Region’ Archaeo_ID - unique identifier (1-146) corresponding to the region code used in the ArchaeoGLOBE land use questionnaire and all ArchaeoGLOBE datasets Archaeo_RG - text description of each region Total_Area - the total area, in square kilometers, of each region Land-Area - the total area minus the area of all lakes and reservoirs found within each region (source: https://www.naturalearthdata.com/downloads/10m-physical-vectors/10m-lakes/) PDF of Region Attribute Table: ArchaeoGLOBE Regions Attributes.pdf Excel file of Region Attribute Table: ArchaeoGLOBE Regions Attributes.xls Printed Maps in PDF Format: ArchaeoGLOBE Regions.pdf Documentation of the ArchaeoGLOBE Regional Map: ArchaeoGLOBE Regions README.doc

  5. H

    Child Care Bureau

    • dataverse.harvard.edu
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Nov 30, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2010). Child Care Bureau [Dataset]. http://doi.org/10.7910/DVN/3YOBMN
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2010
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Users can get data on child care programs and child care expenditures. Background The Child Care Bureau is housed under the Office of Family Assistance portion of the Administration of Children and Families. The Child Care Bureau’s purpose is to promote access to affordable, high quality child care and after-school programs. Through the administration of the Child Care and Development Fund, the Child Care Bureau provides financial assistance to low-income families and oversees the implementation of state child care policies and programs. User Functionality The website provides a variety of information regarding the administration, laws and regulations of the Child Care and Development Fund. All the information is available for download in Word or PDF formats. Users can also view data tables regarding child care program statistics and Care and Development Expenditures. Child care program statistics includes information about number of children and families served, and percentages by age group, race/ ethnicity, payment method or type and place of care. Information is organized by state. All data tables can be downloaded as Excel files of PDF files. Data Notes Data tables are available for each year since 1998. The most recent data available is from 2008.

  6. d

    Dataset metadata of known Dataverse installations

    • search.dataone.org
    • dataverse.harvard.edu
    • +1more
    Updated Nov 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Gautier, Julian
    Description

    This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.

  7. d

    Analysis Practice Data

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshad, Abdul Rehman (2023). Analysis Practice Data [Dataset]. http://doi.org/10.7910/DVN/R1VIPU
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Arshad, Abdul Rehman
    Description

    This data set comes as a supplementary resource for my book on Biostatistics and SPSS. Readers are free to download this file and practice using SPSS as they go along reading the book.

  8. H

    Data from: Teaching Entrepreneurship: Impact of Business Training on...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dean Karlan; Martin Valdivia (2019). Teaching Entrepreneurship: Impact of Business Training on Microfinance Clients and Institutions [Dataset]. http://doi.org/10.7910/DVN/27985
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Dean Karlan; Martin Valdivia
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Lima, Ayacucho, Peru
    Description

    Using a randomized control trial, we measure the marginal impact of adding business training to a Peruvian group lending program for female microentrepreneurs. Treatment groups received thirty- to sixty-minute entrepreneurship training sessions during their normal weekly or monthly banking meeting over a period of one to two years. Control groups remained as they were before, meeting at the same frequency but solely for making loan and savings payments. We find little or no evidence of changes in key outcomes such as business revenue, profits, or employment. We nevertheless observed business knowledge improvements and increased client retention rates for the microfinance institution.

  9. H

    AfroGrid V1.0

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ore Koren; Justin Schon (2023). AfroGrid V1.0 [Dataset]. http://doi.org/10.7910/DVN/LDI5TK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ore Koren; Justin Schon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Studies on the impact of environmental stressors on conflict have proliferated in recent years, but a consensus is slow to emerge, at least partly due to empirical limitations. In this study, we present Afro-Grid: an integrated, disaggregated 0.5 degree grid-month data on conflict, environmental stress, and socioeconomic features in Africa, intended to propel research on these issues forward. Afro-Grid offers several important extensions for researchers and policymakers, including: (i) standardizing (using established methods) data sources on conflict, environmental stress, and socio economic factors across spatial and temporal scales; (ii) combining these data into a single, openly-available file, maximizing the accessibility of these data or researchers and policymakers regardless of their software background; and (iii) including NDVI and dual-series harmonized night lights series that have traditionally not been accessible to researchers without advanced computational expertise. Using a series of comparative regressions at the grid-month and grid-year levels, combined with reporting descriptive statistics and visualizations, we illustrate that this temporally and geographically disaggregated dataset provides valuable extensions for research related to the climate-conflict nexus and the role of socioeconomic features in shaping conflict trends, as well as for research and policy on development, politics, and economics broadly.

  10. H

    MicroMap - CellDesigner xml and supporting files

    • dataverse.harvard.edu
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ines Thiele (2025). MicroMap - CellDesigner xml and supporting files [Dataset]. http://doi.org/10.7910/DVN/FZKMJ8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Ines Thiele
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The CellDesigner xml format allows for MicroMap inspection using the CellDesigner software available at https://www.celldesigner.org, as well as computational modelling and visualisation using the COBRA Toolbox https://opencobra.github.io The pdf format allows for map inspection. We suggest to download the file and 'open with' a web browser of your choice for relatively fast and responsive exploration.

  11. H

    ACCESS DB Version (Aug 29 2017)

    • dataverse.harvard.edu
    Updated Nov 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Fuller (2017). ACCESS DB Version (Aug 29 2017) [Dataset]. http://doi.org/10.7910/DVN/SHMDGU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Michael Fuller
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 417,000 men and women. Users must download two compressed files: a BASE file and a USER file. To use this database do the following: Please download 20170829CBDBavBase.7z (the BASE file) and 20170829CBDBavUser.7z (the USER file) uncompress them to the same folder. Uncompressed there will be get four files CBDB_InstallationGuide.pdf, HelpFiles folder, 20170829CBDBavUser.mdb, 20170829CBDBavBase.mdb. The CBDB_InstallationGuide.pdf gives instructions on installation: see Part 2. "Installing the Database".

  12. H

    Social B(eye)as Dataset

    • dataverse.harvard.edu
    Updated Jan 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher (2019). Social B(eye)as Dataset [Dataset]. http://doi.org/10.7910/DVN/APZKSS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Pinar Barlas; Kyriakos Kyriakou; Styliani Kleanthous; Jahna Otterbacher
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    EU Horizon 2020 Research and Innovation Programme
    Description

    Image analysis algorithms have become an indispensable tool in our information ecosystem, facilitating new forms of visual communication and information sharing. At the same time, they enable large-scale socio-technical research which would otherwise be difficult to carry out. However, their outputs may exhibit social bias, especially when analyzing people images. Since most algorithms are proprietary and opaque, we pro-pose a method of auditing their outputs for social biases. To be able to compare how algorithms interpret a controlled set of people images, we collected descriptions across six image tagging APIs. In order to com-pare these results to human behavior, we also collected descriptions on the same images from crowdworkers in two anglophone regions. While the APIs do not output explicitly offensive descriptions, as humans do, future work should consider if and how they reinforce social inequalities in implicit ways. Beyond computer vision auditing, the dataset of human- and machine-produced tags, and the typology of tags, can be used to explore a range of research questions related to both algorithmic and human behaviors.

  13. H

    Replication Data for: "Bilateral or Multilateral? International Financial...

    • dataverse.harvard.edu
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Lang; Axel Dreher; B. Peter Rosendorff; James Raymond Vreeland (2023). Replication Data for: "Bilateral or Multilateral? International Financial Flows and the Dirty-Work Hypothesis" [Dataset]. http://doi.org/10.7910/DVN/CGXPF5
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Valentin Lang; Axel Dreher; B. Peter Rosendorff; James Raymond Vreeland
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication Data for: "Bilateral or Multilateral? International Financial Flows and the Dirty-Work Hypothesis"

  14. H

    school_data

    • dataverse.harvard.edu
    Updated Jan 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeehee Han (2016). school_data [Dataset]. http://doi.org/10.7910/DVN/MTHO5E
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2016
    Dataset provided by
    Harvard Dataverse
    Authors
    Jeehee Han
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This folder includes a Stata do-file (that merges and cleans all the excel files in the folder) and files of administrative data downloaded from KESS (http://kess.kedi.re.kr/index). Files on administrative data were sorted by years, school types, and the types of variables. *Note: please download this entire folder and save the folder as "school_data" for easier replication. The name will be used in another Stata do-file when merging school data with other Stata data files.

  15. H

    Process_patents

    • dataverse.harvard.edu
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Seliger; Sebastian Heinrich; Nicolas Banholzer (2023). Process_patents [Dataset]. http://doi.org/10.7910/DVN/CBSK2W
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Florian Seliger; Sebastian Heinrich; Nicolas Banholzer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains patent filings at the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO) and their corresponding "process shares" which are calculated with different methods. We provide two files: One with patent filings at the EPO (process_patents_epo.txt), the other with patent filings at the USPTO (process_patents_uspto.zip). The USPTO file is in .zip format due to its large size. The process share indicates to which degree a patent is a process patent rather than a product patent. The shares have been calculated based on the classification of patent claims as being process claims or not. Patent abstracts have been classified in the same way. The PDF file Codebook provides an overview on all columns in the data. A detailed data description can be found in the study on the EPO's homepage (title of the study: "Knowledge spillovers from product and process inventions and their impact on firm performance"): https://www.epo.org/learning-events/materials/academic-research-programme/research-project-grants.html Please make sure to cite this study if you use the data in your work. Funding by the "European Office Academic Research Programme" is gratefully ackknowledged.

  16. H

    Replication Data for: 'Inflammatory Political Campaigns and Racial Bias in...

    • dataverse.harvard.edu
    Updated Dec 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pauline Grosjean; Federico Masera; Hasin Yousaf (2022). Replication Data for: 'Inflammatory Political Campaigns and Racial Bias in Policing' [Dataset]. http://doi.org/10.7910/DVN/A3B9HE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Pauline Grosjean; Federico Masera; Hasin Yousaf
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The programs replicate tables and figures from "Inflammatory Political Campaigns and Racial Bias in Policing", by Grosjean, Masera, and Yousaf. Please see the Readme file for additional details. The data files are too large to host on Dataverse but are available for download here: https://hu.sharepoint.com/:f:/s/HarvardEconomicsDatasets/Eg3OHui76VxIqrlsdE_mjGkBOxsJgCbr0FBogKAHighNeA?e=CfzIgc

  17. H

    Worldwide Fulltext Usage of Data Astrophysics Data System in 2011

    • dataverse.harvard.edu
    • data.niaid.nih.gov
    Updated Oct 22, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SAO/NASA Astrophysics Data System (2013). Worldwide Fulltext Usage of Data Astrophysics Data System in 2011 [Dataset]. http://doi.org/10.7910/DVN/22951
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    SAO/NASA Astrophysics Data System
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/22951

    Dataset funded by
    NASAhttp://nasa.gov/
    Description

    The data contained in these files (one in Excel, the other in JSON format) consists of full text download numbers through the ADS during the year 2011. Every row is a journal, indicated by the journal name and the ADS abbreviation ("bibstem", see: http://adsabs.harvard.edu/abs_doc/journals2.html). For each journal, we present the download numbers split up by publication year (with the first data column being the range "pre 1998"). Full text downloads within the ADS service are defined as 'clicks' on either of the links within an ADS record that provide access to full text in one form or other. Specifically, these are the 'E', 'F', 'L', 'G' or 'X' links (see http://doc.adsabs.harvard.edu/abs_doc/help_pages/results.html#List_of_Links definitions). The data contained in these files had been released under the CC-BY License (see: http://creativecommons.org/licenses/by/3.0/us/). Please acknowledge the ADS in a publication that makes us of these data by the phrase: ``This research has made use of NASA's Astrophysics Data System."

  18. H

    ACCESS DB Version (April 24 2019)

    • dataverse.harvard.edu
    Updated May 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Fuller (2021). ACCESS DB Version (April 24 2019) [Dataset]. http://doi.org/10.7910/DVN/2UFYFG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 26, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    Michael Fuller
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFGhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/2UFYFG

    Description

    Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 420,000 men and women in MS ACCESS Format. Documentation is included. Project Website (2019-04-24)

  19. H

    Replication Data for: Measurement Issues in Conflict Event Data: Addressing...

    • dataverse.harvard.edu
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mert Can Yilmaz; Magnus Öberg (2025). Replication Data for: Measurement Issues in Conflict Event Data: Addressing some misconceptions about what drives differences between human-coded event datasets [Dataset]. http://doi.org/10.7910/DVN/WMZW4C
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Mert Can Yilmaz; Magnus Öberg
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This replication package accompanies the paper "Measurement Issues in Conflict Event Data: Addressing some misconceptions about what drives differences between human-coded event datasets." It contains all data necessary to reproduce the analyses presented in the study. Please note that due to data restrictions, we are unable to openly share the raw ACLED data. However, the data is available to registered users and can be downloaded directly from the Armed Conflict Location & Event Data Project (ACLED) website: https://acleddata.com. The dataset we used was exported on February 28, 2025. While you can download the data from ACLED, it may have been modified since then and thus may not correspond exactly to the dataset referenced here. As ACLED does not provide a versioning system, reproducing the exact same analyses may not be possible without consulting the copy attached here.

  20. H

    Extracted Data from: Health Center Service Delivery and Look-Alike Sites

    • dataverse.harvard.edu
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health Resources & Services Administration (2025). Extracted Data from: Health Center Service Delivery and Look-Alike Sites [Dataset]. http://doi.org/10.7910/DVN/RT7CIO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 22, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Health Resources & Services Administration
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    May 21, 2025
    Area covered
    United States
    Description

    This submission includes publicly available data extracted in its original form. If you have questions about the underlying data stored here, please contact HRSA Contact Center (Phone: 877-464-4772 (TTY: 877-897-9910)). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu.” "This dataset provides a list of federally-funded health centers that provide health services. For more than 40 years, Health Resources and Services Administration (HRSA)-supported health centers have provided comprehensive, culturally competent, quality primary health care services to medically underserved communities and vulnerable populations. Health centers are community-based and consumer-run organizations that serve populations with limited access to health care. These include low-income populations, the uninsured, those with limited English proficiency, migratory and seasonal agricultural workers, individuals and families experiencing homelessness, and those living in public housing." [Quote from: https://data.hrsa.gov/data/download?data=HSCD#HSCD]

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Payal Chandak (2022). PrimeKG [Dataset]. http://doi.org/10.7910/DVN/IXA7BM

PrimeKG

Related Article
Explore at:
229 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2022
Dataset provided by
Harvard Dataverse
Authors
Payal Chandak
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Here, we present the Precision Medicine Knowledge Graph (PrimeKG). This resource provides a holistic view of diseases. We have integrated 20 high-quality datasets, biorepositories and ontologies to curate this knowledge graph. PrimeKG systematically captures information about 17,080 diseases with 4,050,249 relationships representing various major biological scales, including diseases, drugs, genes, proteins, exposures, phenotypes, drug side effects, molecular functions, cellular components, biological processes, anatomical regions, and pathways. Disease nodes in our multi-relational knowledge graph are densely connected to every other node type. PrimeKG's rich graph structure is supplemented with textual descriptions of clinical guidelines for drug and disease nodes to enable multi-modal disease exploration. To get started with using PrimeKG, please explore our project website: https://zitniklab.hms.harvard.edu/projects/PrimeKG/

Search
Clear search
Close search
Google apps
Main menu