100+ datasets found

d
Data Management Plan Examples Database
search.dataone.org
borealisdata.ca
Updated Sep 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/SDITUG
Dataset updated
Sep 4, 2024
Dataset provided by
Borealis
Authors
Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak
Time period covered
Jan 1, 2011 - Jan 1, 2023
Description
This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.
Meta Kaggle Code
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(133186454988 bytes)Available download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
H
Advancing Open and Reproducible Water Data Science by Integrating Data...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
Explore at:
zip(50.9 MB)Available download formats
Dataset updated
Jan 9, 2024
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
U
Data from: Database for the U.S. Geological Survey Woods Hole Science...
data.usgs.gov
gimi9.com
+1more
Updated Jan 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Buczkowski (2025). Database for the U.S. Geological Survey Woods Hole Science Center's marine sediment samples, including locations, sample data and collection information (SED_ARCHIVE) [Dataset]. https://data.usgs.gov/datacatalog/data/USGS:5359d475-defb-4a2c-9226-906a99616be0
Explore at:
Dataset updated
Jan 7, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Brian Buczkowski
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
2006
Area covered
Woods Hole
Description
The U.S. Geological Survey (USGS), Woods Hole Science Center (WHSC) has been an active member of the Woods Hole research community for over 40 years. In that time there have been many sediment collection projects conducted by USGS scientists and technicians for the research and study of seabed environments and processes. These samples are collected at sea or near shore and then brought back to the WHSC for study. While at the Center, samples are stored in ambient temperature, cold or freezing conditions, depending on the best mode of preparation for the study being conducted or the duration of storage planned for the samples. Recently, storage methods and available storage space have become a major concern at the WHSC. The shapefile sed_archive.shp, gives a geographical view of the samples in the WHSC's collections, and where they were collected along with images and hyperlinks to useful resources.
H
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
dataverse.harvard.edu
search.dataone.org
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ARKOTI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Apr 28, 2020
Dataset provided by
Harvard Dataverse
Authors
Jamie Monogan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
c
Exhibit of Datasets
datacatalogue.cessda.eu
ssh.datastations.nl
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P.K. Doorn; L. Breure (2024). Exhibit of Datasets [Dataset]. http://doi.org/10.17026/SS/TLTMIR
Explore at:
Unique identifier
https://doi.org/10.17026/SS/TLTMIR
Dataset updated
Sep 3, 2024
Dataset provided by
DANS (retired)
Authors
P.K. Doorn; L. Breure
Description
The Exhibit of Datasets was an experimental project with the aim of providing concise introductions to research datasets in the humanities and social sciences deposited in a trusted repository and thus made accessible for the long term. The Exhibit consists of so-called 'showcases', short webpages summarizing and supplementing the corresponding data papers, published in the Research Data Journal for the Humanities and Social Sciences. The showcase is a quick introduction to such a dataset, a bit longer than an abstract, with illustrations, interactive graphs and other multimedia (if available). As a rule it also offers the option to get acquainted with the data itself, through an interactive online spreadsheet, a data sample or link to the online database of a research project. Usually, access to these datasets requires several time consuming actions, such as downloading data, installing the appropriate software and correctly uploading the data into these programs. This makes it difficult for interested parties to quickly assess the possibilities for reuse in other projects.

The Exhibit aimed to help visitors of the website to get the right information at a glance by: - Attracting attention to (recently) acquired deposits: showing why data are interesting. - Providing a concise overview of the dataset's scope and research background; more details are to be found, for example, in the associated data paper in the Research Data Journal (RDJ). - Bringing together references to the location of the dataset and to more detailed information elsewhere, such as the project website of the data producers. - Allowing visitors to explore (a sample of) the data without downloading and installing associated software at first (see below). - Publishing related multimedia content, such as videos, animated maps, slideshows etc., which are currently difficult to include in online journals as RDJ. - Making it easier to review the dataset. The Exhibit would also have been the right place to publish these reviews in the same way as a webshop publishes consumer reviews of a product, but this could not yet be achieved within the limited duration of the project.

Note (1) The text of the showcase is a summary of the corresponding data paper in RDJ, and as such a compilation made by the Exhibit editor. In some cases a section 'Quick start in Reusing Data' is added, whose text is written entirely by the editor. (2) Various hyperlinks such as those to pages within the Exhibit website will no longer work. The interactive Zoho spreadsheets are also no longer available because this facility has been discontinued.
U
Research data supporting “Hybrid Sankey diagrams: visual analysis of...
researchdata.bath.ac.uk
repository.cam.ac.uk
Updated Oct 24, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rick Lupton; Julian M. Allwood (2016). Research data supporting “Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use” [Dataset]. http://doi.org/10.17863/CAM.6038
Explore at:
Unique identifier
https://doi.org/10.17863/CAM.6038
Dataset updated
Oct 24, 2016
Dataset provided by
University of Bath
University of Cambridge
Authors
Rick Lupton; Julian M. Allwood
Dataset funded by
Engineering and Physical Sciences Research Council
Description
This data includes two example databases from the paper "Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use": the made-up fruit flows, and real global steel flow data from Cullen et al. (2012). It also includes the Sankey Diagram Definitions to reproduce the diagrams in the paper. The code to reproduce the figures is written in Python in the form of Jupyter notebooks. A conda environment file is included to easily set up the necessary Python packages to run the notebooks. All files are included in the "examples.zip" file. The notebook files are also uploaded standalone so they can be linked to nbviewer.
f
Data from: Workshop FAIR Data and Data Reuse for Environmental Science Group...
figshare.com
data.4tu.nl
zip
Updated Nov 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
L. (Luc) Steinbuch; Cindy Quik (2022). Workshop FAIR Data and Data Reuse for Environmental Science Group Researchers [Dataset]. http://doi.org/10.4121/21399975.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21399975.v1
Dataset updated
Nov 1, 2022
Dataset provided by
4TU.ResearchData
Authors
L. (Luc) Steinbuch; Cindy Quik
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We designed and organized a one-day workshop, where in the context of FAIR the following themes were discussed and practiced: scientific transparency and reproducibility; how to write a README; data and code licenses; spatial data; programming code; examples of published datasets; data reuse; and discipline and motivation. The intended audience were researchers at the Environmental Science Group of Wageningen University and Research. All workshop materials were designed with further development and reuse in mind and are shared through this dataset.
S
Survey on current state of scientific data sharing in mainland China(2015)
scidb.cn
Updated Jan 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
张丽丽; 赖茂生 (2017). Survey on current state of scientific data sharing in mainland China(2015) [Dataset]. http://doi.org/10.11922/sciencedb.372
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.11922/sciencedb.372
Dataset updated
Jan 18, 2017
Dataset provided by
Science Data Bank
Authors
张丽丽; 赖茂生
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consists of a questionnaire, named “Survey on the current state of scientific data sharing in mainland China,” and a data file of valid samples. The questionnaire includes 12 sections and 31 questions. The data file conctains 2 sheets. One is comprised of 370 valid sample data records. And the other is used for defining the 44 fields appearing in the former sheet, wherein each field’s name, datatype, length, defintion, and value range are described.
Open Access to and Reuse of Research Data 2006
services.fsd.tuni.fi
datacatalogue.cessda.eu
zip
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borg, Sami; Kuula, Arja (2025). Open Access to and Reuse of Research Data 2006 [Dataset]. http://doi.org/10.60686/t-fsd2268
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.60686/t-fsd2268
Dataset updated
Jan 9, 2025
Dataset provided by
Finnish Social Science Data Archive
Authors
Borg, Sami; Kuula, Arja
Description
The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. The respondents were professors of human sciences, social sciences and behavioural sciences in Finnish universities, and representatives of some research institutes. Opinions were also queried on the OECD guidelines and principles on open access to research data from public funding. First, the respondents were asked whether there were any guidelines or regulations concerning the depositing of digital research data in their departments, what happened to research data after the completion of the original research, and to what extent the data were reused. Further questions covered how often the data from completed research projects were reused in secondary research projects or for theses. The respondents also estimated what proportion of the data collected in their departments/institutes were reusable at the time of the survey, and why research data were not being reused in their own field of research. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse. Opinions on the OECD Open Access guidelines on research data were queried. The respondents were asked whether they had earlier knowledge of the guidelines, and to what extent its principles could be implemented in their own disciplines. Some questions pertained to the advantages and disadvantages of open access to research data. The advantages mentioned included reducing duplicate data collection and more effective use of data resources, whereas the disadvantages mentioned included, for example, risks connected to data protection and misuse of data. The respondents also suggested ways of implementing the Open Access guidelines and gave their opinions on how binding the recommendations should be, to what extent various bodies should be involved in formulating the guidelines, and how the archiving and dissemination of digital research data should be organised. Finally, the respondents estimated how the researchers in their field would react to enhancing open access to research data, and also gave their opinion on open access to the data they themselves have collected. Background variables included the respondent's gender, university, and research field.
Data from: Data Papers as a New Form of Knowledge Organization in the Field...
ssh.datastations.nl
datacatalogue.cessda.eu
ods, pdf, zip
Updated Jun 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DANS Data Station Social Sciences and Humanities (2019). Data Papers as a New Form of Knowledge Organization in the Field of Research Data [Dataset]. http://doi.org/10.17026/dans-zk3-jkyb
Explore at:
pdf(216582), zip(18880), ods(15303)Available download formats
Unique identifier
https://doi.org/10.17026/dans-zk3-jkyb
Dataset updated
Jun 7, 2019
Dataset provided by
Data Archiving and Networked Services
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In order to analyse specific features of data papers, we established a representative sample of data journals, based on lists from the European FOSTER Plus project , the German wiki forschungsdaten.org hosted by the University of Konstanz and two French research organizations.The complete list consists of 82 data journals, i.e. journals which publish data papers. They represent less than 0,5% of academic and scholarly journals. For each of these 82 data journals, we gathered information about the discipline, the global business model, the publisher, peer reviewing etc. The analysis is partly based on data from ProQuest’s Ulrichsweb database, enriched and completed by information available on the journals’ home pages.One part of the data journals are presented as “pure” data journals stricto sensu , i.e. journals which publish exclusively or mainly data papers. We identified 28 journals of this category (34%). For each journal, we assessed through direct search on the journals’ homepages (information about the journal, author’s guidelines etc.) the use of identifiers and metadata, the mode of selection and the business model, and we assessed different parameters of the data papers themselves, such as length, structure, linking etc.The results of this analysis are compared with other research journals (“mixed” data journals) which publish data papers along with regular research articles, in order to identify possible differences between both journal categories, on the level of data papers as well as on the level of the regular research papers. Moreover, the results are discussed against concepts of knowledge organization.
Example data for working with the ASpecD framework
zenodo.org
data.niaid.nih.gov
zip
Updated Jul 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Till Biskup; Till Biskup (2023). Example data for working with the ASpecD framework [Dataset]. http://doi.org/10.5281/zenodo.8150115
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8150115
Dataset updated
Jul 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Till Biskup; Till Biskup
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ASpecD is a Python framework for handling spectroscopic data focussing on reproducibility. In short: Each and every processing step applied to your data will be recorded and can be traced back. Additionally, for each representation of your data (e.g., figures, tables) you can easily follow how the data shown have been processed and where they originate from.

To provide readers of the publication describing the ASpecD framework with a concrete example of data analysis making use of recipe-driven data analysis, this repository contains both, a recipe as well as the data that are analysed, as shown in the publication describing the ASpecD framework:

Jara Popp, Till Biskup: ASpecD: A Modular Framework for the Analysis of Spectroscopic Data Focussing on Reproducibility and Good Scientific Practice. Chemistry--Methods 2:e202100097, 2022. doi:10.1002/cmtd.202100097
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
SEA Oceanographic Data
pacificdata.org
pacific-data.sprep.org
html
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sea Education Association (SEA) (2023). SEA Oceanographic Data [Dataset]. https://pacificdata.org/data/dataset/groups/sea-oceanographic-data
Explore at:
html(20), html(17114)Available download formats
Dataset updated
Feb 24, 2023
Dataset provided by
Sea Education Association
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Scientific data collected on the Westward, Corwith Cramer, and Robert C. Seamans are invaluable products of SEA’s educational research programs. SEA supports dissemination and sharing of data with educators and researchers to benefit the broader science community and the public. We aim to encourage and ensure fair access to SEA data while also preserving the intellectual property of individual researchers and seeking opportunities for collaboration. Laboratory and deployment equipment on each vessel is described here.

https://www.sea.edu/images/galleries/gallery_03.jpg" style="width: 350px; height: 233px;">

Data Collection

On every SEA voyage, Standard Collections include the following:

Continuous GPS positions, surface flow-through seawater system data (temperature, salinity, chlorophyll-a fluorescence, CDOM fluorescence, transmissivity), and anemometer wind speed/direction data

Continuous ADCP current speed/direction and Chirp bathymetry data

Daylight observations of seabirds, marine mammals, Sargassum and large marine debris

CTD profiles, including data from auxiliary sensors

Measurements of extracted chlorophyll-a, nutrients, pH and alkalinity from discrete water samples

Surface and subsurface zooplankton net data (and in some cases preserved specimens), including counts and/or biovolumes of plastic debris, tar balls, Sargassum macroalgae, Halobates, phyllosoma (lobster larvae), leptocephali (eel larvae), myctophids (mesopelagic fish), gelatinous organisms, cephalopods, other nekton (>2cm in size), and mesozooplankton (<2cm in size)

On individual cruises, additional samples or data beyond Standard Collections may also be generated during projects led by SEA faculty or collaborating investigators; these data are managed by the project Principal Investigator.

As a service to the scientific community, and to fulfill SEA’s obligations to the U.S. State Department and foreign governments, a formal cruise report is prepared for all international voyages. Cruise reports include: a list of the ship’s company and students, a map of the voyage track, tables of sampling station locations and activities, most raw data in tables or graphics, and abstracts from student oceanographic research projects.

SEA’s decades-long voyage history for the Westward, Corwith Cramer and Robert C. Seamans can be accessed and searched here.
Citations to software and data in Zenodo via open sources
zenodo.org
data.niaid.nih.gov
csv
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen (2020). Citations to software and data in Zenodo via open sources [Dataset]. http://doi.org/10.5281/zenodo.3482927
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3482927
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stephanie van de Sandt; Stephanie van de Sandt; Alex Ioannidis; Alex Ioannidis; Lars Holm Nielsen; Lars Holm Nielsen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In January 2019, the Asclepias Broker harvested citation links to Zenodo objects from three discovery systems: the NASA Astrophysics Datasystem (ADS), Crossref Event Data and Europe PMC. Each row of our dataset represents one unique link between a citing publication and a Zenodo DOI. Both endpoints are described by basic metadata. The second dataset contains usage metrics for every cited Zenodo DOI of our data sample.
c
Data from: Matlab Scripts and Sample Data Associated with Water Resources...
s.cnmilf.com
gdr.openei.org
+4more
Updated Jun 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State University (2021). Matlab Scripts and Sample Data Associated with Water Resources Research Article [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/matlab-scripts-and-sample-data-associated-with-water-resources-research-article
Explore at:
Dataset updated
Jun 25, 2021
Dataset provided by
California State University
Description
Scripts and data acquired at the Mirror Lake Research Site, cited by the article submitted to Water Resources Research: Distributed Acoustic Sensing (DAS) as a Distributed Hydraulic Sensor in Fractured Bedrock M. W. Becker(1), T. I. Coleman(2), and C. C. Ciervo(1) 1 California State University, Long Beach, Geology Department, 1250 Bellflower Boulevard, Long Beach, California, 90840, USA. 2 Silixa LLC, 3102 W Broadway St, Suite A, Missoula MT 59808, USA. Corresponding author: Matthew W. Becker (matt.becker@csulb.edu).
i
Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...
catalog.ihsn.org
datacatalog.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Hashemite Kingdom of Jordan Department of Statistics (DOS) (2019). Household Expenditure and Income Survey 2010, Economic Research Forum (ERF) Harmonization Data - Jordan [Dataset]. https://catalog.ihsn.org/index.php/catalog/7662
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
The Hashemite Kingdom of Jordan Department of Statistics (DOS)
Time period covered
2010 - 2011
Area covered
Jordan
Description
Abstract

The main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.

Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty

Geographic coverage

National

Analysis unit

Households

Individuals

Kind of data

Sample survey data [ssd]

Sampling procedure

The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.

A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.

It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.

Mode of data collection

Face-to-face [f2f]

Research instrument

General form

Expenditure on food commodities form

Expenditure on non-food commodities form

Cleaning operations

Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.

Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
d
Data Rescue & Curation Best Practices Guide
search.dataone.org
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OCUL Data Community (ODC) Data Rescue Group (2023). Data Rescue & Curation Best Practices Guide [Dataset]. https://search.dataone.org/view/sha256%3Abd8e6836f6e968f84c8c8b18cac6128dc17a0e20d927a771879ad9d649a3568d
Explore at:
Unique identifier
https://doi.org/10.5683/SP2/Y8MQXV
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
OCUL Data Community (ODC) Data Rescue Group
Description
The aim of the Data Rescue & Curation Best Practices Guide is to provide an accessible and hands-on approach to handling data rescue and digital curation of at-risk data for use in secondary research. We provide a set of examples and workflows for addressing common challenges with social science survey data that can be applied to other social and behavioural research data. The goal of this guide and set of workflows presented is to improve librarians’ and data curators’ skills in providing access to high-quality, well-documented, and reusable research data. The aspects of data curation that are addressed throughout this guide are adopted from long-standing data library and archiving practices, including: documenting data using standard metadata, file and data organization; using open and software-agnostic formats; and curating research data for reuse.
d
Data from: Sample Identifiers and Metadata Reporting Format for...
search.dataone.org
data.ess-dive.lbl.gov
+5more
Updated Apr 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joan Damerow; Charu Varadharajan; Kristin Boye; Eoin Brodie; Madison Burrus; Dana Chadwick; Shreyas Cholia; Robert Crystal-Ornelas; Hesham Elbashandy; Ricardo Eloy Alves; Kim Ely; Amy Goldman; Valerie Hendrix; Christopher Jones; Matt Jones; Zarine Kakalia; Kenneth Kemner; Annie Kersting; Kate Maher; Nancy Merino; Fianna O'Brien; Zach Perzan; Emily Robles; Cory Snavely; Patrick Sorensen; James Stegen; Pamela Weisenhorn; Karen Whitenack; Mavrik Zavarin; Deb Agarwal (2023). Sample Identifiers and Metadata Reporting Format for Environmental Systems Science [Dataset]. https://search.dataone.org/view/ess-dive-9238aa3808df326-20230403T210001090
Explore at:
Dataset updated
Apr 3, 2023
Dataset provided by
ESS-DIVE
Authors
Joan Damerow; Charu Varadharajan; Kristin Boye; Eoin Brodie; Madison Burrus; Dana Chadwick; Shreyas Cholia; Robert Crystal-Ornelas; Hesham Elbashandy; Ricardo Eloy Alves; Kim Ely; Amy Goldman; Valerie Hendrix; Christopher Jones; Matt Jones; Zarine Kakalia; Kenneth Kemner; Annie Kersting; Kate Maher; Nancy Merino; Fianna O'Brien; Zach Perzan; Emily Robles; Cory Snavely; Patrick Sorensen; James Stegen; Pamela Weisenhorn; Karen Whitenack; Mavrik Zavarin; Deb Agarwal
Description
The ESS-DIVE sample identifiers and metadata reporting format primarily follows the System for Earth Sample Registration (SESAR) Global Sample Number (IGSN) guide and template, with modifications to address Environmental Systems Science (ESS) sample needs and practicalities (IGSN-ESS). IGSNs are associated with standardized metadata to characterize a variety of different sample types (e.g. object type, material) and describe sample collection details (e.g. latitude, longitude, environmental context, date, collection method). Globally unique sample identifiers, particularly IGSNs, facilitate sample discovery, tracking, and reuse; they are especially useful when sample data is shared with collaborators, sent to different laboratories or user facilities for analyses, or distributed in different data files, datasets, and/or publications. To develop recommendations for multidisciplinary ecosystem and environmental sciences, we first conducted research on related sample standards and templates. We provide a comparison of existing sample reporting conventions, which includes mapping metadata elements across existing standards and Environment Ontology (ENVO) terms for sample object types and environmental materials. We worked with eight U.S. Department of Energy (DOE) funded projects, including those from Terrestrial Ecosystem Science and Subsurface Biogeochemical Research Scientific Focus Areas. Project scientists tested the process of registering samples for IGSNs and associated metadata in workflows for multidisciplinary ecosystem sciences. We provide modified IGSN metadata guidelines to account for needs of a variety of related biological and environmental samples. While generally following the IGSN core descriptive metadata schema, we provide recommendations for extending sample type terms, and connecting to related templates geared towards biodiversity (Darwin Core) and genomic (Minimum Information about any Sequence, MIxS) samples and specimens. ESS-DIVE recommends registering samples for IGSNs through SESAR, and we include instructions for registration using the IGSN-ESS guidelines. Our resulting sample reporting guidelines, template (IGSN-ESS), and identifier approach can be used by any researcher with sample data for ecosystem sciences.
Research Data Framework (RDaF) Database
catalog.data.gov
gimi9.com
+1more
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Research Data Framework (RDaF) Database [Dataset]. https://catalog.data.gov/dataset/research-data-framework-rdaf-database
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
The NIST RDaF is a map of the research data space that uses a lifecycle approach with six high-level lifecycle stages to organize key information concerning research data management (RDM) and research data dissemination. Through a community-driven and in-depth process, stakeholders identified topics and subtopics?programmatic and operational activities, concepts, and other important factors relevant to RDM. All elements of the RDaF framework foundation?the lifecycle stages and their associated topics and subtopics?are defined. Most subtopics have several informative references, which are resources such as guidelines, standards, and policies that assist stakeholders in addressing that subtopic. Further, the NIST RDaF team identified 14 Overarching Themes which are pervasive throughout the framework. The Framework foundation enables organizations and individual researchers to use the RDaF for self-assessment of their RDM status. The RDaF includes sample ?profiles? for various job functions or roles, each containing topics and subtopics that an individual in the given role is encouraged to consider in fulfilling their RDM responsibilities. Individual researchers and organizations involved in the research data lifecycle can tailor these profiles for their specific job function using a tool available on the RDaF website. The methodologies used to generate all features of the RDaF are described in detail in the publication NIST SP 1500-8.This database version of the NIST RDaF is designed so that users can readily navigate the various lifecycle stages, topics, subtopics, and overarching themes from numerous locations. In addition, unlike the published text version, links are included for the definitions of most topics and subtopics and for informative references for most subtopics. For more information on the database, please see the FAQ page.

Facebook

Twitter

Click to copy link

Link copied

Cite

Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak (2024). Data Management Plan Examples Database [Dataset]. http://doi.org/10.5683/SP3/SDITUG

Data Management Plan Examples Database

Explore at:

Unique identifier

https://doi.org/10.5683/SP3/SDITUG

Dataset updated

Sep 4, 2024

Dataset provided by

Borealis

Authors

Evering, Danica; Acharya, Shrey; Pratt, Isaac; Behal, Sarthak

Time period covered

Jan 1, 2011 - Jan 1, 2023

Description

This dataset is comprised of a collection of example DMPs from a wide array of fields; obtained from a number of different sources outlined below. Data included/extracted from the examples include the discipline and field of study, author, institutional affiliation and funding information, location, date created, title, research and data-type, description of project, link to the DMP, and where possible external links to related publications or grant pages. This CSV document serves as the content for a McMaster Data Management Plan (DMP) Database as part of the Research Data Management (RDM) Services website, located at https://u.mcmaster.ca/dmps. Other universities and organizations are encouraged to link to the DMP Database or use this dataset as the content for their own DMP Database. This dataset will be updated regularly to include new additions and will be versioned as such. We are gathering submissions at https://u.mcmaster.ca/submit-a-dmp to continue to expand the collection.

Clear search

Close search

Google apps

Main menu

Data Management Plan Examples Database

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Advancing Open and Reproducible Water Data Science by Integrating Data...

Data from: Database for the U.S. Geological Survey Woods Hole Science...

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

Exhibit of Datasets

Research data supporting “Hybrid Sankey diagrams: visual analysis of...

Data from: Workshop FAIR Data and Data Reuse for Environmental Science Group...

Survey on current state of scientific data sharing in mainland China(2015)

Open Access to and Reuse of Research Data 2006

Data from: Data Papers as a New Form of Knowledge Organization in the Field...

Example data for working with the ASpecD framework

Dataset of development of business during the COVID-19 crisis

SEA Oceanographic Data

Data Collection

Citations to software and data in Zenodo via open sources

Data from: Matlab Scripts and Sample Data Associated with Water Resources...

Household Expenditure and Income Survey 2010, Economic Research Forum (ERF)...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Data Rescue & Curation Best Practices Guide

Data from: Sample Identifiers and Metadata Reporting Format for...

Research Data Framework (RDaF) Database

Data Management Plan Examples Database