This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions
in Meta Kaggle. The file names match the ids in the KernelVersions
csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads
. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The U.S. Geological Survey (USGS), Woods Hole Science Center (WHSC) has been an active member of the Woods Hole research community for over 40 years. In that time there have been many sediment collection projects conducted by USGS scientists and technicians for the research and study of seabed environments and processes. These samples are collected at sea or near shore and then brought back to the WHSC for study. While at the Center, samples are stored in ambient temperature, cold or freezing conditions, depending on the best mode of preparation for the study being conducted or the duration of storage planned for the samples. Recently, storage methods and available storage space have become a major concern at the WHSC. The shapefile sed_archive.shp, gives a geographical view of the samples in the WHSC's collections, and where they were collected along with images and hyperlinks to useful resources.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
The Exhibit of Datasets was an experimental project with the aim of providing concise introductions to research datasets in the humanities and social sciences deposited in a trusted repository and thus made accessible for the long term. The Exhibit consists of so-called 'showcases', short webpages summarizing and supplementing the corresponding data papers, published in the Research Data Journal for the Humanities and Social Sciences. The showcase is a quick introduction to such a dataset, a bit longer than an abstract, with illustrations, interactive graphs and other multimedia (if available). As a rule it also offers the option to get acquainted with the data itself, through an interactive online spreadsheet, a data sample or link to the online database of a research project. Usually, access to these datasets requires several time consuming actions, such as downloading data, installing the appropriate software and correctly uploading the data into these programs. This makes it difficult for interested parties to quickly assess the possibilities for reuse in other projects.
The Exhibit aimed to help visitors of the website to get the right information at a glance by: - Attracting attention to (recently) acquired deposits: showing why data are interesting. - Providing a concise overview of the dataset's scope and research background; more details are to be found, for example, in the associated data paper in the Research Data Journal (RDJ). - Bringing together references to the location of the dataset and to more detailed information elsewhere, such as the project website of the data producers. - Allowing visitors to explore (a sample of) the data without downloading and installing associated software at first (see below). - Publishing related multimedia content, such as videos, animated maps, slideshows etc., which are currently difficult to include in online journals as RDJ. - Making it easier to review the dataset. The Exhibit would also have been the right place to publish these reviews in the same way as a webshop publishes consumer reviews of a product, but this could not yet be achieved within the limited duration of the project.
Note (1) The text of the showcase is a summary of the corresponding data paper in RDJ, and as such a compilation made by the Exhibit editor. In some cases a section 'Quick start in Reusing Data' is added, whose text is written entirely by the editor. (2) Various hyperlinks such as those to pages within the Exhibit website will no longer work. The interactive Zoho spreadsheets are also no longer available because this facility has been discontinued.
This data includes two example databases from the paper "Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use": the made-up fruit flows, and real global steel flow data from Cullen et al. (2012). It also includes the Sankey Diagram Definitions to reproduce the diagrams in the paper. The code to reproduce the figures is written in Python in the form of Jupyter notebooks. A conda environment file is included to easily set up the necessary Python packages to run the notebooks. All files are included in the "examples.zip" file. The notebook files are also uploaded standalone so they can be linked to nbviewer.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We designed and organized a one-day workshop, where in the context of FAIR the following themes were discussed and practiced: scientific transparency and reproducibility; how to write a README; data and code licenses; spatial data; programming code; examples of published datasets; data reuse; and discipline and motivation. The intended audience were researchers at the Environmental Science Group of Wageningen University and Research. All workshop materials were designed with further development and reuse in mind and are shared through this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of a questionnaire, named “Survey on the current state of scientific data sharing in mainland China,” and a data file of valid samples. The questionnaire includes 12 sections and 31 questions. The data file conctains 2 sheets. One is comprised of 370 valid sample data records. And the other is used for defining the 44 fields appearing in the former sheet, wherein each field’s name, datatype, length, defintion, and value range are described.
The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. The respondents were professors of human sciences, social sciences and behavioural sciences in Finnish universities, and representatives of some research institutes. Opinions were also queried on the OECD guidelines and principles on open access to research data from public funding. First, the respondents were asked whether there were any guidelines or regulations concerning the depositing of digital research data in their departments, what happened to research data after the completion of the original research, and to what extent the data were reused. Further questions covered how often the data from completed research projects were reused in secondary research projects or for theses. The respondents also estimated what proportion of the data collected in their departments/institutes were reusable at the time of the survey, and why research data were not being reused in their own field of research. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse. Opinions on the OECD Open Access guidelines on research data were queried. The respondents were asked whether they had earlier knowledge of the guidelines, and to what extent its principles could be implemented in their own disciplines. Some questions pertained to the advantages and disadvantages of open access to research data. The advantages mentioned included reducing duplicate data collection and more effective use of data resources, whereas the disadvantages mentioned included, for example, risks connected to data protection and misuse of data. The respondents also suggested ways of implementing the Open Access guidelines and gave their opinions on how binding the recommendations should be, to what extent various bodies should be involved in formulating the guidelines, and how the archiving and dissemination of digital research data should be organised. Finally, the respondents estimated how the researchers in their field would react to enhancing open access to research data, and also gave their opinion on open access to the data they themselves have collected. Background variables included the respondent's gender, university, and research field.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In order to analyse specific features of data papers, we established a representative sample of data journals, based on lists from the European FOSTER Plus project , the German wiki forschungsdaten.org hosted by the University of Konstanz and two French research organizations.The complete list consists of 82 data journals, i.e. journals which publish data papers. They represent less than 0,5% of academic and scholarly journals. For each of these 82 data journals, we gathered information about the discipline, the global business model, the publisher, peer reviewing etc. The analysis is partly based on data from ProQuest’s Ulrichsweb database, enriched and completed by information available on the journals’ home pages.One part of the data journals are presented as “pure” data journals stricto sensu , i.e. journals which publish exclusively or mainly data papers. We identified 28 journals of this category (34%). For each journal, we assessed through direct search on the journals’ homepages (information about the journal, author’s guidelines etc.) the use of identifiers and metadata, the mode of selection and the business model, and we assessed different parameters of the data papers themselves, such as length, structure, linking etc.The results of this analysis are compared with other research journals (“mixed” data journals) which publish data papers along with regular research articles, in order to identify possible differences between both journal categories, on the level of data papers as well as on the level of the regular research papers. Moreover, the results are discussed against concepts of knowledge organization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ASpecD is a Python framework for handling spectroscopic data focussing on reproducibility. In short: Each and every processing step applied to your data will be recorded and can be traced back. Additionally, for each representation of your data (e.g., figures, tables) you can easily follow how the data shown have been processed and where they originate from.
To provide readers of the publication describing the ASpecD framework with a concrete example of data analysis making use of recipe-driven data analysis, this repository contains both, a recipe as well as the data that are analysed, as shown in the publication describing the ASpecD framework:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Scientific data collected on the Westward, Corwith Cramer, and Robert C. Seamans are invaluable products of SEA’s educational research programs. SEA supports dissemination and sharing of data with educators and researchers to benefit the broader science community and the public. We aim to encourage and ensure fair access to SEA data while also preserving the intellectual property of individual researchers and seeking opportunities for collaboration. Laboratory and deployment equipment on each vessel is described here.
https://www.sea.edu/images/galleries/gallery_03.jpg" style="width: 350px; height: 233px;">
On every SEA voyage, Standard Collections include the following:
On individual cruises, additional samples or data beyond Standard Collections may also be generated during projects led by SEA faculty or collaborating investigators; these data are managed by the project Principal Investigator.
As a service to the scientific community, and to fulfill SEA’s obligations to the U.S. State Department and foreign governments, a formal cruise report is prepared for all international voyages. Cruise reports include: a list of the ship’s company and students, a map of the voyage track, tables of sampling station locations and activities, most raw data in tables or graphics, and abstracts from student oceanographic research projects.
SEA’s decades-long voyage history for the Westward, Corwith Cramer and Robert C. Seamans can be accessed and searched here.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.
Scripts and data acquired at the Mirror Lake Research Site, cited by the article submitted to Water Resources Research: Distributed Acoustic Sensing (DAS) as a Distributed Hydraulic Sensor in Fractured Bedrock M. W. Becker(1), T. I. Coleman(2), and C. C. Ciervo(1) 1 California State University, Long Beach, Geology Department, 1250 Bellflower Boulevard, Long Beach, California, 90840, USA. 2 Silixa LLC, 3102 W Broadway St, Suite A, Missoula MT 59808, USA. Corresponding author: Matthew W. Becker (matt.becker@csulb.edu).
The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.
Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty
National
Sample survey data [ssd]
The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.
A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.
It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.
Face-to-face [f2f]
Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.
Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
The aim of the Data Rescue & Curation Best Practices Guide is to provide an accessible and hands-on approach to handling data rescue and digital curation of at-risk data for use in secondary research. We provide a set of examples and workflows for addressing common challenges with social science survey data that can be applied to other social and behavioural research data. The goal of this guide and set of workflows presented is to improve librarians’ and data curators’ skills in providing access to high-quality, well-documented, and reusable research data. The aspects of data curation that are addressed throughout this guide are adopted from long-standing data library and archiving practices, including: documenting data using standard metadata, file and data organization; using open and software-agnostic formats; and curating research data for reuse.
The ESS-DIVE sample identifiers and metadata reporting format primarily follows the System for Earth Sample Registration (SESAR) Global Sample Number (IGSN) guide and template, with modifications to address Environmental Systems Science (ESS) sample needs and practicalities (IGSN-ESS). IGSNs are associated with standardized metadata to characterize a variety of different sample types (e.g. object type, material) and describe sample collection details (e.g. latitude, longitude, environmental context, date, collection method). Globally unique sample identifiers, particularly IGSNs, facilitate sample discovery, tracking, and reuse; they are especially useful when sample data is shared with collaborators, sent to different laboratories or user facilities for analyses, or distributed in different data files, datasets, and/or publications. To develop recommendations for multidisciplinary ecosystem and environmental sciences, we first conducted research on related sample standards and templates. We provide a comparison of existing sample reporting conventions, which includes mapping metadata elements across existing standards and Environment Ontology (ENVO) terms for sample object types and environmental materials. We worked with eight U.S. Department of Energy (DOE) funded projects, including those from Terrestrial Ecosystem Science and Subsurface Biogeochemical Research Scientific Focus Areas. Project scientists tested the process of registering samples for IGSNs and associated metadata in workflows for multidisciplinary ecosystem sciences. We provide modified IGSN metadata guidelines to account for needs of a variety of related biological and environmental samples. While generally following the IGSN core descriptive metadata schema, we provide recommendations for extending sample type terms, and connecting to related templates geared towards biodiversity (Darwin Core) and genomic (Minimum Information about any Sequence, MIxS) samples and specimens. ESS-DIVE recommends registering samples for IGSNs through SESAR, and we include instructions for registration using the IGSN-ESS guidelines. Our resulting sample reporting guidelines, template (IGSN-ESS), and identifier approach can be used by any researcher with sample data for ecosystem sciences.
The NIST RDaF is a map of the research data space that uses a lifecycle approach with six high-level lifecycle stages to organize key information concerning research data management (RDM) and research data dissemination. Through a community-driven and in-depth process, stakeholders identified topics and subtopics?programmatic and operational activities, concepts, and other important factors relevant to RDM. All elements of the RDaF framework foundation?the lifecycle stages and their associated topics and subtopics?are defined. Most subtopics have several informative references, which are resources such as guidelines, standards, and policies that assist stakeholders in addressing that subtopic. Further, the NIST RDaF team identified 14 Overarching Themes which are pervasive throughout the framework. The Framework foundation enables organizations and individual researchers to use the RDaF for self-assessment of their RDM status. The RDaF includes sample ?profiles? for various job functions or roles, each containing topics and subtopics that an individual in the given role is encouraged to consider in fulfilling their RDM responsibilities. Individual researchers and organizations involved in the research data lifecycle can tailor these profiles for their specific job function using a tool available on the RDaF website. The methodologies used to generate all features of the RDaF are described in detail in the publication NIST SP 1500-8.This database version of the NIST RDaF is designed so that users can readily navigate the various lifecycle stages, topics, subtopics, and overarching themes from numerous locations. In addition, unlike the published text version, links are included for the definitions of most topics and subtopics and for informative references for most subtopics. For more information on the database, please see the FAQ page.
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.