CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the metadata of the datasets published in 101 Dataverse installations, information about the metadata blocks of 106 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 88 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and August 30, 2024 using a "get_dataverse_installations_metadata" function in a collection of Python functions at https://github.com/jggautier/dataverse-scripts/blob/main/dataverse_repository_curation_assistant/dataverse_repository_curation_assistant_functions.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL for which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens in order to use certain API endpoints. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author_2024.08.25-2024.08.30.csv │ ├── contributor_2024.08.25-2024.08.30.csv │ ├── data_source_2024.08.25-2024.08.30.csv │ ├── ... │ └── topic_classification_2024.08.25-2024.08.30.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2024.08.26_15.52.42.zip │ ├── dataset_pids_Abacus_2024.08.26_15.52.42.csv │ ├── Dataverse_JSON_metadata_2024.08.26_15.52.42 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2024.08.26_00.02.51.zip │ ├── ... │ └── Yale_Dataverse_2024.08.25_03.52.57.zip └── dataverse_installations_summary_2024.08.30.csv └── dataset_pids_from_most_known_dataverse_installations_2024.08.csv └── license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv └── metadatablocks_from_most_known_dataverse_installations_2024.08.30.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 101 Dataverse installations. For example, author_2024.08.25-2024.08.30.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 101 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 106 zip files, one zip file for each of the 106 Dataverse installations whose sites were functioning when I attempted to collect their metadata. Each zip file contains a directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. For installations that had published datasets, and I was able to use Dataverse APIs to download the dataset metadata, the zip file also contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2024.08.30.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2024.08.csv file contains the dataset PIDs of published datasets in 101 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 101 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv file contains information about the licenses and...
Finnhub is the ultimate stock api in the market, providing real-time and historical price for global stocks with Rest API and websocket. We also support a tons of other financial data like stock fundamentals, analyst estimates, fundamental data and more. Download the file to access balance sheet of Amazon.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Course Planner API allows developers to create applications that will interact with Course Planner data. Using this API, you can build applications that will allow your users (that are enrolled Harvard College/GSAS students) to add courses to their Course Planner, view the courses that are in the Course Planner, and remove courses.
The Harvard Art Museums API is a REST-style service designed for developers who wish to explore and integrate the museums’ collections in their projects. The API provides direct access to the data that powers the museums' website and many other aspects of the museums.
The Harvard Faculty Finder creates creates an institution-wide view of the breadth and depth of Harvard faculty and scholarship, and it helps students, faculty, administrators, and the general public locate Harvard faculty according to research and teaching expertise. More information about the HFF website and the data it contains can be found on the Harvard University Faculty Development & Diversity website. HFF is a Semantic Web application, which means its content can be read and understood by other computer programs. This enables the data associated with a person, such as titles, contact information, and publications to be shared with other institutions and appear on other websites. Below are the technical details for building a computer program that can export data from HFF. The data is available through an API. No authentication is required. Documentation can be found at http://api.facultyfinder.harvard.edu, or you can see a snapshot of the documentation as the data for this entry. The API entry points are described in the documentation
DOI The Sicpa_OpenData libraries allow to facilitate the publication of data to the INRAE dataverse in a transparent way 1/ by simplifying the creation of the metadata document from the data already present in the information systems, 2/ by simplifying the use of dataverse.org APIs.
Petition subject: Indian resources Original: http://nrs.harvard.edu/urn-3:FHCL:25500494 Date of creation: 1880-11-07 Petition location: Cottage City [Oak Bluffs] Selected signatures:Priscilla Freeman Total signatures: 1 Females of color signatures: 1 Female only signatures: Yes Identifications of signatories: an Indian and one of the riparian proprietors of a certain pond containing more than twenty acres situated in said county and known as "Tisbury Great Pond", [females of color] Prayer format was printed vs. manuscript: Manuscript Additional non-petition or unrelated documents available at archive: additional documents available Additional archivist notes: right to fish in Tisbury Great Pond, lease, waters, commissioners on inland fisheries, Allen Look and others, natural rights as an Indian Location of the petition at the Massachusetts Archives of the Commonwealth: Resolves 1881, c.49, passed April 23, 1881 Acknowledgements: Supported by the National Endowment for the Humanities (PW-5105612), Massachusetts Archives of the Commonwealth, Radcliffe Institute for Advanced Study at Harvard University, Center for American Political Studies at Harvard University, Institutional Development Initiative at Harvard University, and Harvard University Library.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Spatially-explicit data is increasingly becoming available across disciplines, yet they are often limited to a specific domain. In order to use such datasets in a coherent analysis, such as to decide where to target specific types of agricultural investment, there should be an effort to make such datasets harmonized and interoperable. For Africa South of the Sahara (SSA) region, the HarvestChoice CELL5M Database was developed in this spirit of moving multidisciplinary data into one harmonized, geospatial database. The database includes over 750 biophysical and socio-economic indicators, many of which can be easily expanded to global scale. The CELL5M database provides a platform for cross-cutting spatial analyses and fine-grain visualization of the mix of farming systems and populations across SSA. It was created as the central core to support a decision-making platform that would enable development practitioners and researchers to explore multi-faceted spatial relationships at the nexus of poverty, health and nutrition, farming systems, innovation, and environment. The database is a matrix populated by over 350,000 grid cells covering SSA at five arc-minute spatial resolution. Users of the database, including those conduct researches on agricultural policy, research, and development issues, can also easily overlay their own indicators. Numerical aggregation of the gridded data by specific geographical domains, either at subnational level or across country borders for more regional analysis, is also readily possible without needing to use any specific GIS software. See the HCID database (http://dx.doi.org/10.7910/DVN/MZLXVQ) for the geometry of each grid cell. The database also provides standard-compliant data API that currently powers several web-based data visualization and analytics tools.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set contains the IDs of the 1,186,322 tweets used in "Climate Nags: Affect and the Convergence of Global Risk in Online Networks" (published in Continuum, 2023). The data was collected from Twitter's Streaming API using the DMI-TCAT during the first four months of the Coronavirus pandemic; 2020 U.S. presidential race; and the early stages of the 2022 Russia–Ukraine War. These collections were then filtered based on key words related to climate change (see README file for more details).
This file contains the event records that is extracted and clean from Heliophysics Event Knowledgebase (HEK) API. The data contains records of four event types (AR, CH, FL, SG) from Jan. 1 2012 to Dec. 31 2014. Corresponding image files can be found in lsdo dataverse (https://dataverse.harvard.edu/dataverse/lsdo).
Harvard Catalyst Profiles is a Semantic Web application, which means its content can be read and understood by other computer programs. This enables the data in profiles, such as addresses and publications, to be shared with other institutions and appear on other websites. If you click the "Export RDF" link on the left sidebar of a profile page, you can see what computer programs see when visiting a profile. The section below describes the technical details for building a computer program that can export data from Harvard Catalyst Profiles. There are four types of application programming interfaces (APIs) in Harvard Catalyst Profiles. RDF crawl. Because Harvard Catalyst Profiles is a Semantic Web application, every profile has both an HTML page and a corresponding RDF document, which contains the data for that page in RDF/XML format. Web crawlers can follow the links embedded within the RDF/XML to access additional content. SPARQL endpoint. SPARQL is a programming language that enables arbitrary queries against RDF data. This provides the most flexibility in accessing data; however, the downsides are the complexity in coding SPARQL queries and performance. In general, the XML Search API (see below) is better to use than SPARQL. However, if you require access to the SPARQL endpoint, please contact Griffin Weber. XML Search API. This is a web service that provides support for the most common types of queries. It is designed to be easier to use and to offer better performance than SPARQL, but at the expense of fewer options. It enables full-text search across all entity types, faceting, pagination, and sorting options. The request message to the web service is in XML format, but the output is in RDF/XML format. The URL of the XML Search API is https://connects.catalyst.harvard.edu/API/Profiles/Public/Search. Old XML based web services. This provides backwards compatibility for institutions that built applications using the older version of Harvard Catalyst Profiles. These web services do not take advantage of many of the new features of Harvard Catalyst Profiles. Users are encouraged to switch to one of the new APIs. The URL of the old XML web service is https://connects.catalyst.harvard.edu/ProfilesAPI. For more information about the APIs, please see the documentation and example files.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This article describes the novel open source tools for open data publication in open access journal workflows. This comprises a plugin for Open Journal Systems that supports a data submission, citation, review, and publication workflow; and an extension to the Dataverse system that provides a standard deposit API. We describe the function and design of these tools, provide examples of their use, and summarize their initial reception. We conclude by discussing future plans and potential impact.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this dataset, we present CML-COVID, a COVID-19 Twitter data set of 19,298,967 million tweets from 5,977,653 unique individuals collected between March 2020 and July 2020. The prefix to each filename is the search query used. The CML-COVID dataset is released in compliance with Twitter’s Terms & Conditions (T&C) which prohibit the verbatim release of full tweet text and API-derived data. Rather, we provide a list of tweet IDs that others can directly ‘hydrate’ using calls to the Twitter API. If you use the CML-COVID dataset, please cite this dataset to acknowledge your use of our data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The tabular file contains information on known Harvard repositories on GitHub, such as the number of stars, programming language, day last updated, number of open issues, size, number of forks, repository URL, create date, and description. Each repository has a corresponding JSON file (see primary-data.zip) that was retrieved using the GitHub API with code and a list of repositories available from https://github.com/IQSS/open-source-at-harvard.
This dataset contains a list of all the open datasets we have collected through the Socrata API and is used in developing the PRIVEE interface. We have enriched the dataset with metadata information of these datasets, including their columns, tags, and the number of rows. We have also identified some of the quasi-identifiers present in these datasets.
Dataset Metrics Total size of data uncompressed: 59,515,177,346 bytes Number of objects (submissions): 19,456,493 Reddit API Documentation: https://www.reddit.com/dev/api/ Overview This dataset contains all available submissions from Reddit during the month of May, 2019 (using UTC time boundaries). The data has been split to accommodate the file upload limitations for dataverse. Each file is a collection of json objects (ndjson). Each file was then compressed using zstandard compression (https://facebook.github.io/zstd). The files should be ordered by the id of the submission (represented by the id field). The time that each object was ingested is recorded in the retrieved_on field (in epoch seconds). Methodology Monthly Reddit ingests are usually started around a week into a new month for the previous month (but could be delayed). This gives submission scores, gildings and num_comments time to "settle" close to their eventual score before Reddit archives the posts (usually done after six months from the post's creation). All submissions are ingested via Reddit's API (using the /api/info endpoint). This is a "best effort" attempt to get all available data at the time of ingest. Due to the nature of Reddit, subreddits can go from private to public at any time, so it's possible more submissions could be found by rescanning missing ids. The author of this dataset highly encourages any researchers to do a sanity check on the data and to rescan for missing ids to ensure all available data has been gathered. If you need assistance, you can contact me directly. All efforts were made to capture as much data as possible. Generally, > 95% of all ids are captured. Missing data could be the result of Reddit API errors, submissions that were private during the ingest but then became public and subreddits that were quarantined and were not added to the whitelist before ingesting the data. When collecting the data, two scans are done. The first scan of ids using the /api/info endpoint collects all available data. After the first scan, a second scan is done requesting only missing ids from the first scan. This helps to keep the data as complete and comprehensive as possible. Contact If you have any questions about the data or require more details on the methodology, you are welcome to contact the author.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains the trained models and object detection results of 2 architectures found in the Detectron2 library, on the MS COCO val2017 dataset, under different JPEG compresion level Q = {5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96} (14 levels per trained model). Architectures: F50 – Faster R-CNN on ResNet-50 with FPN R50 – RetinaNet on ResNet-50 with FPN Training type: D2 – Detectron2 Model ZOO pre-trained 1x model (90.000 iterations, batch 16) STD – standard 1x training (90.000 iterations) on original train2017 dataset Q20 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=20 Q40 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=40 T20 – extra 1x training on top of D2 on train2017 dataset degraded to Q=20 T40 – extra 1x training on top of D2 on train2017 dataset degraded to Q=40 Model and metrics files models_FasterRCNN.tar.gz (F50-STD, F50-Q20, …) models_RetinaNet.tar.gz (R50-STD, R50-Q20, …) For every model there are 3 files: config.yaml – the Detectron2 config of the model. model_final.pth – the weights (training snapshot) in PyTorch format. metrics.json – training metrics (like time, total loss, etc.) every 20 iterations. The D2 models were not included, because they are available from the Detectron2 Model ZOO, as faster_rcnn_R_50_FPN_1x (F50-D2) and retinanet_R_50_FPN_1x (R50-D2). Result files F50-results.tar.gz – results for Faster R-CNN models (inluding D2). R50-results.tar.gz – results for RetinaNet models (inluding D2). For every model there are 14 subdirectories, e.g. evaluator_dump_R50x1_005 through evaluator_dump_R50x1_096, for each of the JPEG Q values. Each such folder contains: coco_instances_results.json – all detected objects (image id, bounding box, class index and confidence). results.json – AP metrics as computed by COCO API. Source code for processing the data The data can be processed using our code, published at: https://github.com/tgandor/urban_oculus. Additional dependencies for the source code: COCO API Detectron2
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This sample was drawn from the Crossref API on March 8, 2022. The sample was constructed purposefully on the hypothesis that records with at least one known issue would be more likely to yield issues related to cultural meanings and identity. Records known or suspected to have at least one quality issue were selected by the authors and Crossref staff. The Crossref API was then used to randomly select additional records from the same prefix. Records in the sample represent 51 DOI prefixes that were chosen without regard for the manuscript management or publishing platform used, as well as 17 prefixes for journals known to use the Open Journal Systems manuscript management and publishing platform. OJS was specifically identified due to the authors' familiarity with the platform, its international and multilingual reach, and previous work on its metadata quality.
Geotagged public tweets from Twitter streaming API. Date range: January 1, 2016 to December 31, 2017. Data size:4 GB; about 170 million tweets with hashtags. Attributes: Each tweet is associated with a tweet id, timestamp, anonymized user ID, and a list of hashtags.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the metadata of the datasets published in 101 Dataverse installations, information about the metadata blocks of 106 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 88 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and August 30, 2024 using a "get_dataverse_installations_metadata" function in a collection of Python functions at https://github.com/jggautier/dataverse-scripts/blob/main/dataverse_repository_curation_assistant/dataverse_repository_curation_assistant_functions.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL for which I was able to create an account and another column named "apikey" listing my accounts' API tokens. The Python script expects the CSV file and the listed API tokens to get metadata and other information from installations that require API tokens in order to use certain API endpoints. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author_2024.08.25-2024.08.30.csv │ ├── contributor_2024.08.25-2024.08.30.csv │ ├── data_source_2024.08.25-2024.08.30.csv │ ├── ... │ └── topic_classification_2024.08.25-2024.08.30.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2024.08.26_15.52.42.zip │ ├── dataset_pids_Abacus_2024.08.26_15.52.42.csv │ ├── Dataverse_JSON_metadata_2024.08.26_15.52.42 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2024.08.26_00.02.51.zip │ ├── ... │ └── Yale_Dataverse_2024.08.25_03.52.57.zip └── dataverse_installations_summary_2024.08.30.csv └── dataset_pids_from_most_known_dataverse_installations_2024.08.csv └── license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv └── metadatablocks_from_most_known_dataverse_installations_2024.08.30.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 101 Dataverse installations. For example, author_2024.08.25-2024.08.30.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 101 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 106 zip files, one zip file for each of the 106 Dataverse installations whose sites were functioning when I attempted to collect their metadata. Each zip file contains a directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. For installations that had published datasets, and I was able to use Dataverse APIs to download the dataset metadata, the zip file also contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2024.08.30.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2024.08.csv file contains the dataset PIDs of published datasets in 101 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 101 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2024.08.28_14.42.54.csv file contains information about the licenses and...