Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw data and processed data from the Dataverse Community Survey 2022. The main goal of the survey was to help the Global Dataverse Community Consortium (GDCC; https://dataversecommunity.global/) and the Dataverse Project (https://dataverse.org/) decide on what actions to take to improve the Dataverse software and the larger ecosystem of integrated tools and services as well as better support community members. The results from the survey may also be of interest to other communities working on software and services for managing research data. The survey was designed to map out the current status as well as the roadmaps and priorities of Dataverse installations around the world. The main target group for participating in the survey were the people/teams responsible for operating Dataverse installations around the world. A secondary target group were people/teams at organizations that are planning to deploy or considering deploying a Dataverse installation. There were 34 existing and planned Dataverse installations participating in the survey.
Facebook
TwitterThe tabular file contains information on known Harvard repositories on GitHub, such as the number of stars, programming language, day last updated, number of open issues, size, number of forks, repository URL, create date, and description. Each repository has a corresponding JSON file (see primary-data.zip) that was retrieved using the GitHub API with code and a list of repositories available from https://github.com/IQSS/open-source-at-harvard.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataverse hosts the data repository of the article entitled "Open Source Software as Digital Platforms to Innovate" . It contains databases and R software codes that replicate the main results of the article. The article contains a detailed description of how these databases were constructed and how they are organized.
Facebook
TwitterThis article describes the novel open source tools for open data publication in open access journal workflows. This comprises a plugin for Open Journal Systems that supports a data submission, citation, review, and publication workflow; and an extension to the Dataverse system that provides a standard deposit API. We describe the function and design of these tools, provide examples of their use, and summarize their initial reception. We conclude by discussing future plans and potential impact.
Facebook
TwitterThe goal of the Open Source Indicators (OSI) Program was to make automated predictions of significant societal events through the continuous and automated analysis of publicly available data such as news media, social media, informational websites, and satellite imagery. Societal events of interest included civil unrest, disease outbreaks, and election results. Geographic areas of interest include countries in Latin America (LA) and the Middle East and North Africa (MENA). The handbook is intended to serve as a reference document for the OSI Program and a companion to the ground truth event data used for test and evaluation. The handbook provides guidance regarding the types of events considered; the submission of automated predictions or “warnings;” the development of ground truth; the test and evaluation of submitted warnings; performance measures; and other programmatic information. IARPA initiated a solicitation for OSI Research Teams in late summer 2011 for one base year and two option years of research. MITRE was selected as the Test and Evaluation (T&E) Team in November 2011. Following a review of proposals, three teams (BBN, HRL, and Virginia Tech (VT)) were selected. The OSI Program officially began in April 2012; manual event encoding and formal T&E ended in March 2015.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Study information Design ideation study (N = 24) using eye tracking technology. Participants solved a total of twelve design problems while receiving inspirational stimuli on a monitor. Their task was to generate as many solutions to each problem and explain their solution briefly by thinking aloud. The study allows for getting further insight into how inspirational stimuli improve idea fluency during design ideation. This dataset features processed data from the experiment. Eye tracking data includes gaze data, fixation data, blink data, and pupillometry data for all participants. The study is based on the following research paper and follows the same experimental setup: Goucher-Lambert, K., Moss, J., & Cagan, J. (2019). A neuroimaging investigation of design ideation with and without inspirational stimuli—understanding the meaning of near and far stimuli. Design Studies, 60, 1-38. DOI Dataset Most files in the dataset are saved as CSV files or other human readable file formats. Large files are saved in Hierarchical Data Format (HDF5/H5) to allow for smaller file sizes and higher compression. All data is described thoroughly in 00_ReadMe.txt. The following processed data is included in the dataset: Concatenated annotations file of experimental flow for all participants (CSV). All eye tracking raw data in concatenated files. Annotated with only participant ID. (CSV/HDF5) Annotated eye tracking data for ideation routines only. A subset of the files above. (CSV/HDF5) Audio transcriptions from Google Cloud Speech-to-Text API of each recording with annotations. (CSV) Raw API response for each transcription. These files include time offset for each word in a recording. (JSON) Data for questionnaire feedback and ideas generated during the experiment. (CSV) Data for the post-experiment survey, including demographic information (TSV). Python code used for the open-source experimental setup and dataset construction is hosted at GitHub. Repository also includes code of how the dataset has been further processed.
Facebook
TwitterThe "Handian" corpus ( 汉典 or Hàn diăn, i.e, the "Han canon" or "Han classics") contains over 18,000 classics of ancient Chinese philosophy, as well as documents of historical and biographical significance, and literary works. The versions of the documents presented here are derived from www.zdic.net under their permissive Creative Commons 1.0 Public Domain Dedication. These significant cultural texts are modeled here and published in the Journal of Cultural Analytics. The Dataverse repository contains the models for the InPhO Topic Explorer (handian-ca.tez), installation instructions (README.md), and supplemental materials (handian-ca-supplemental.pdf). These works are released under the Creative Commons Attribution-Share Alike 4.0 International License (CC BY-SA 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set contains the key output files for the results shown in the related publication. Specifically, the dataset should allow the reproduction of the REEF3D::CFD simulations. Certain result deviations may occur due to different software versions.
Facebook
TwitterOpen source flower images available in Python distribution. Raw images converted to TFRecord format in offline process.
Facebook
Twitterhttps://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/VQC4ODhttps://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/VQC4OD
It contains the code for replicating the main results and training data for the article "Write access provisioning and organizational ownership in open source software projects: Exploring the impact on project novelty and survival"
Facebook
TwitterThis dataset integrates data from multiple publicly available sources to enhance the social and environmental analytical potential of the 2017 and 2020 HIFLD prison boundaries datasets. The HIFLD prison boundary feature class contains secure detention facilities. These facilities range in jurisdiction from federal (excluding military) to local governments. Polygon geometry is used to describe the extent of where the incarcerated population is located (fence lines or building footprints). This feature class’s attribution describes many physical and social characteristics of detention facilities in the United States and some of its territories. The attribution for this feature class was populated by open source search methodologies of authoritative sources. We have manually coded the corresponding EPA Facility Registry Service (FRS) ID number to every facility for which we could find a reasonable match (source: https://www.epa.gov/frs/frs-facilities-state-single-file-csv-download). This FRS ID number enables finding corresponding environmental permits, inspections, violations, and enforcement actions. We have additionally created additional socially significant categories: ICE facilities, private prisons. Purpose: This feature class contains secure detention facilities with EPA FRS ID and additional socially relevant variables for research on the environmental injustices of mass incarceration by Carceral Ecologies.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A systematically retrieved dataset consisting of 33 open-source software projects containing a large number of typed artifacts and trace links between them. The artifacts stem from the projects' issue tracking system and source version control system to enable their joint analysis. Enriched with additional metadata, such as time stamps, release versions, component information, and developer comments, the dataset is highly suitable for empirical research, e.g., in requirements and software traceability analysis, software evolution, bug and feature localization, and stakeholder collaboration. It can stimulate new research directions, facilitate the replication of existing studies, and act as benchmark for the comparison of competing approaches.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This open-source computational tool is designed for the simulation and analysis of planar linkage mechanisms. Aimed at students, educators and engineers, the software offers a flexible and intuitive environment for modeling mechanical systems. It features a custom domain-specific language for defining mechanisms through variables, equations, and structural data, and combines symbolic preprocessing with numerical solvers for kinematic and dynamic analysis. The tool includes an interactive graphical user interface (GUI) for real-time configuration and visualization. Validated through representative test cases, it delivers accurate results for position, velocity, acceleration, and force analysis. Entirely free of proprietary dependencies, the application serves as an accessible alternative to commercial simulation tools, promoting educational equity and supporting learning through visualization and experimentation.
Facebook
Twitterhttps://dataverse.nl/api/datasets/:persistentId/versions/3.2/customlicense?persistentId=doi:10.34894/2WZ0S9https://dataverse.nl/api/datasets/:persistentId/versions/3.2/customlicense?persistentId=doi:10.34894/2WZ0S9
Collated data from disparate sources including vector, raster, and published reports and maps to produce a global delta protection measures database. The dataset includes three layers and a table. Polygon layer containing leveed areas (Leveed Areas) imported from vector data or drawn from suitably georeferencing raster/published data Polygon layer containg boundary area for research focus (Delta Polygons), as created by Edmonds et. al. (2020, doi:10.1038/s41467-020-18531-4) Line layer containing levee, defence, or similar features imported from vector data or drawn from suitably georeferencing raster/published data (Levee Lines). Table recording methodology and decision making process (Delta Index) for each delta polygon, as well as reasons for excluding data, country code, and processing/review fields. Metadata for the dataset in its entirety and the individual layers is additionally published and confirms to the INSPIRE standard. The dataset is structure so that each metadata file is within the respective file when downloaded as a zipped archive. In line with an agreed change, the dataset is now attributed to the student only, and the paper, which contains further work using the data can be found at doi:10.5194/nhess-2021-291
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes both (1) the community smells detected in seven Open Source software (DSpace, Dataverse, Eprint3.4, Archivematica, Islandora PKP OJS, Samvera) used by academic libraries for scholarly communication services and (2) the metrics generated by csDetector (code source available at: https://github.com/Nuri22/csDetector). The data is based on Github repositories activities on a 3 months period : may to july 2023.
Facebook
TwitterThe data and programs replicate tables and figures from "Externalities in Knowledge Production: Evidence from a Randomized Field Experiment", by Hinnosaar, Hinnosaar, Kummer, and Slivko. Data were constructed from various sources. Please see the Readme file for additional details.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This archive contains replication materials for "Underproduction Analysis of Open Source Software". This archive contains the extension materials to be used in conduction with two other dataverses: Champion, Kaylea; Hill, Benjamin Mako, 2021, "Replication data and online supplement for: Underproduction: An Approach for Measuring Risk in Open Source Software", https://doi.org/10.7910/DVN/PUCD2P, Harvard Dataverse, V2, UNF:6:A8MV1fxlZnJtlKI3DnGaRg== [fileUNF] and Kaylea Champion, 2024, "Replication Data for: Sources of Underproduction in Open Source Software", https://doi.org/10.7910/DVN/N2HIRS, Harvard Dataverse, V1 You will need the contents of all three archives to fully replicate the materials in "Underproduction Analysis of Open Source Software". See README.txt for full details and instructions.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data set contains files of binary data describing the output of the particle-in-cell simulation of magnetotail reconnection in case of streaming oxygen ions of ionospheric origin. It thus contains electric and magnetic fields and average particle data in the simulation domain. The documentation of the variables and arrays are given below. The dat files are made using the open source language Fortran 90. They are named fields-*.dat, for which the * has the usual meaning in a linux environment. The name "fields" refer to the electromagnetic fields, but the files also contain particle information. The number behind the fields, e.g. 00200, refer to the time in units of the inverse of the electron plasma frequency. The variables are structured identically in each file, only the time of evaluation is of difference.
Facebook
TwitterMDSplus is a software tool designed for data acquisition, storage, and analysis of complex scientific experiments. Over the years, MDSplus has primarily been used for data management for fusion experiments. This paper demonstrates that MDSplus can be used for a much wider variety of systems and experiments. We present a step-by-step tutorial describing how to create a simple experiment, manage the data, and analyze it using MDSplus and Python. To this end, a custom example device was developed to be used as the data source. This device was built on an opensource electronic hardware platform, and it consists of a microcontroller and two sensors. We read data from these sensors, store it in MDSplus, and use JupyterLab to visualize and process it. This project and code demo are available on the GitHub site at this URL: https://github.com/santorofer/MDSplusAndCustomeDevices
Facebook
TwitterVideo advertisements, either through television or the Internet, play an essential role in modern political campaigns. For over two decades, researchers have studied television video ads by analyzing the hand-coded data from the Wisconsin Advertising Project and its successor, the Wesleyan Media Project (WMP). Unfortunately, manually coding more than a hundred of variables, such as issue mentions, opponent appearance, and negativity, for many videos is a laborious and expensive process. We propose to automatically code campaign advertisement videos. Applying state-of-the-art machine learning methods, we extract various audio and image features from each video file. We show that our machine coding is comparable to human coding for many variables of the WMP data sets. Since many candidates make their advertisement videos available on the Internet, automated coding can dramatically improve the efficiency and scope of campaign advertisement research. Open-source software package is available for implementing the proposed methodology.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw data and processed data from the Dataverse Community Survey 2022. The main goal of the survey was to help the Global Dataverse Community Consortium (GDCC; https://dataversecommunity.global/) and the Dataverse Project (https://dataverse.org/) decide on what actions to take to improve the Dataverse software and the larger ecosystem of integrated tools and services as well as better support community members. The results from the survey may also be of interest to other communities working on software and services for managing research data. The survey was designed to map out the current status as well as the roadmaps and priorities of Dataverse installations around the world. The main target group for participating in the survey were the people/teams responsible for operating Dataverse installations around the world. A secondary target group were people/teams at organizations that are planning to deploy or considering deploying a Dataverse installation. There were 34 existing and planned Dataverse installations participating in the survey.