Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There is a wide variety of archives, databases, and repositories currently available that pro-vide access to research data. However, basic information about these systems is often diffi-cult to gather, such as whether there are limits to the size of data sets that can be published or whether there is any publication fee that applies. In addition to that, there are plenty of re-search groups publishing their research data sets independently of these infrastructures, making it difficult for scientists to find them since they are not centrally registered. Research data must be easily discoverable and accessible for scientists to use it effectively. The Data Collections Explorer, developed within the national research data infrastructure for the engineering sciences NFDI4Ing, is an easy-to-use information system addressing these needs. It is a low threshold information system that provides an overview of research data repositories, archives, databases as well as individually published data sets. Similar systems exist in other subject areas, for example the Data Repository Finder focusing on the medi-cal, life and social sciences. Contrary to the Data Collections Explorer, the Data Repository Finder only lists repositories. This is the slide set for the talk as part of the "Engineering Sciences" track at the 1st Conference on Research Data Infrastructures.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and a baseline for future studies of ag research data.
Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data-sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multidisciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to:
Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analyzed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search Methods - We first compiled a list of known domain-specific USDA/ARS datasets/databases represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. - We then searched using search engines such as Bing and Google for non-USDA/federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal/USDA results). Most of these results were domain-specific, though some contained a mix of data subjects. - We searched using search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university website to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. - We found both ag-specific university repositories and general university repositories that housed a portion of agricultural data. Ag-specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). - We then split out NCBI (National Center for Biotechnology Information) repositories. - Next, we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. - Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compi...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the modern era, the near impossibility of true anonymization means we must provide tangible recommendations for researchers who need to share de-identified, person-level data that could potentially be re-identified due to the presence of quasi-identifiers. While various repository aggregators like Re3data and DataCite Repository Finder provide lists of data repositories, navigating these can be cumbersome when trying to locate options for depositing restricted data. These listings rarely include certain necessary details, making the process of recommending third-party repositories to researchers time-consuming – or even limited, and we often end up relying on a short list of well-known repositories. An additional challenge is the difficulty of identifying repositories that mediate access via data usage agreements, where the repository handles access requests to ensure potential users meet established security and privacy requirements and have taken the necessary steps to protect confidentiality and commit to appropriate data use. As part of a capstone project for the Data Services Continuing Education Program, we identified and created a spreadsheet of restricted data repositories with mediated access processes for researchers. While our project scope was limited to the social sciences and US based repositories, in sharing this work, we hope others will continue to contribute to this work and expand on it.
Facebook
TwitterThis dataset tracks the updates made on the dataset "Open Reading Frame Finder (ORF Finder)" as a repository for previous versions of the data and metadata.
Facebook
TwitterThis document describes data collected from the Main Collection of the Web of Science database. Records of published studies addressing the intersection of Open Science and data repository were searched up to January 15th, 2024, and the final dataset was comprised of 545 records for bibliometric analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of Project MILDRED, Development Project of Research Data Infrastructure at University of Helsinki. The project started on April 29, 2016. Project aim is to provide University of Helsinki with state-of-the-art research data management service infrastructure. To gain knowledge about researchers' data storage and preservation practices in 2016, an e-survey was sent to the UH research staff about 1) what data repositories they use for depositing their research data; 2) what reasons they had for not depositing data and 3) what alternative storage devices and repository services they used for their data.The dataset consists of e-survey report master file and analysis of the original master file. The files have been anonymized. A readme.rtf file is included to provide full project and data level documentation.
Facebook
TwitterThe Bear Lake Data Repository (BLDR) is an active archive, containing a growing compilation of biological, chemical, and physical datasets collected from Bear Lake and its surrounding watershed. The datasets herein have been digitized from historical records and reports, extracted from papers and theses, and obtained from public and private entities, including the United States Geological Survey, PacifiCorp, and, inter alia, Ecosystems Research Institute.
Contributions are welcome. The BLDR accepts biological, chemical, or physical datasets obtained at Bear Lake, irrespective of funding source. There is no submission size limit at present—workarounds will be found if submissions exceed Hydroshare limits (20 GB). Contributions are published with an open access license and will serve many use cases. The current repository steward, Bear Lake Watch, will advise on submissions and make accepted contributions available promptly.
Metadata files are provided for each dataset, however, contact with original contributor(s) is encouraged for questions and additional details prior to data usage. The BLDR and its contributors shall not be liable for any damages resulting from misinterpretation or misuse of the data or metadata.
Facebook
TwitterThese research datasets are the updated version of the conference poster "Research data repositories and their metadata: A comparative study," presented by Ms. Kavya Asok and Ms. Snigdha Dandpat in a Conference on Open and FAIR Data Ecosystem: Principles, Policies, and Platforms scheduled from 11th -13th September 2023, at IIC, New Delhi. The study describes the features of a select number of RDRs and analyzes their metadata practices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The numbers in brackets denote the number of datasets used for the analysis. All datasets were harvested and parsed in May 2019.
Facebook
TwitterDataset links to the Digital Collections of Colorado, DSpace Repository. From the homepage, you can search the 1240 datasets hosted there, or browse using a list of filters on the right. DSpace is a digital service that collects, preserves, and distributes digital material. Resources in this dataset: Resource Title: GeoData catalog record. File Name: Web Page, url: https://geodata.nal.usda.gov/geonetwork/srv/eng/catalog.search#/metadata/ShortgrassSteppe_eaa_2015_March_19_1220
Facebook
TwitterCovid-19 has had a big impact on many aspects of our life, including mental health. Since the start of the pandemic, a whole body of Belgian research has been performed on the relation between covid-19 and mental health (care). The mental health & covid-19 working group of the superior health council lists these studies in order to provide advice to policy makers and the general public. The first advisory report focused on international literature on contagious outbreaks, since not many studies on covid-19 and, especially, not many Belgian studies were published yet. This advice can be found here: https://www.health.belgium.be/en/report-9589-mental-health-and-covid-19 As part of the work performed in the first advisory report, the Policy Coordination Working group has asked the Superior Health Council to list all Belgian studies investigating the relation between covid-19 and mental health and/or mental health care and to provide regular updates. The superior Health Council, therefore, started the project of the Belgian mental health data repository. This repository will consist of ongoing studies, preliminary results, accepted and published articles with a Belgian population. For each study, an overview will be given of the authors (including contact details), level of evidence and a short description of the study. The Belgian Mental Health Data Repository will allow for other researchers, policy makers, health care providers and the general public to have a better idea of and easier access to the mental health studies in Belgium. Additionally, more in-depth analyses across studies can be facilitated leading to better insights into the impact of covid-19 on mental health. An update of the living document will be published weekly.
Facebook
TwitterThis dataset tracks the updates made on the dataset "ALW Assisted Living Facility Finder App" as a repository for previous versions of the data and metadata.
Facebook
TwitterData Repository for "A Matheuristic for Complex Pricing Problems: An Application to Rentable Resources"
Facebook
TwitterSince the launch of ODESI in 2008, academic libraries have supported the development of shared infrastructure for open discovery and access to important collections of Canadian social science survey data. With the current migration of all metadata and data collections to the new national Borealis data repository, collaborative curation and best practices are migrating as well, leading to the development of new approaches, training, policies, and documentation, to support the ongoing deposit, curation, and preservation of data in the repository. This presentation will provide an overview of the repository migration project, updates to the search interface to support further integration with Borealis, and steps taken by the technical team and community-led committee to ensure a smooth transition for all ODESI end-users and library data stewards.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports the working paper, "Repository optimisation & techniques to improve discoverability and web impact : an evaluation", currently under review for publication and available as a preprint at: https://doi.org/10.17868/65389/.
The dataset comprises a single OpenDocument Spreadsheet (.ods) format file containing seven data sheets of data pertaining to COUNTER compliant usage statistics, search query traffic from Google Search Console, web traffic data for Google Analytics and Google Scholar, and usage statistics from IRStats2. All data relate to the EPrints repository, Strathprints, based at the University of Strathclyde.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2021. For a description of the data collection, processing, and output methods, please see the "methods" section below.
The record will be revised periodically to make new data available through the remainder of 2021.
Methods
Data Collection
RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).
Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.
The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:
country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.
More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en
Data Processing
Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."
The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.
Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.
About Citable Content Downloads
Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.
CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).
For any specified date range, the steps to calculate CCD are:
Filter data to only include rows where "citableContent" is set to "Yes."
Sum the value of the "clicks" field on these rows.
Output to CSV
Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.
As a result, two CSV datasets are provided for each month of published data:
page-clicks:
The data in these CSV files correspond to the page-level data, and include the following fields:
url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No.
index: The Elasticsearch index corresponding to page click data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data end with “page-clicks”. For example, the file named 2021-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2021.
country-device-info:
The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:
country: The country from which the corresponding search originated.
device: The device used for the search.
impressions: The number of times the URL appears within the SERP.
clicks: The number of clicks on a URL which took users to a page outside of the SERP.
clickThrough: Calculated as the number of clicks divided by the number of impressions.
position: The position of the URL within the SERP.
date: The date of the search.
index: The Elasticsearch index corresponding to country and device access information data for a single IR.
repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.
Filenames for files containing these data end with “country-device-info”. For example, the file named 2021-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2021.
References
Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As part of the “Geospatial Assessment of Women Employment and Business Opportunities in the Energy Sector” project, open-source Gender-related spatial data was collected for 31 Small Island Developing States (SIDS) across the globe, resulting in curated and thoroughly documented geodatabases (GDBs) that are now ready to be explored! Fifty-nine spatial layers were identified and then researched for each country, covering the following categories: Demographics and Population | Renewable Energy | Energy Access | Education | Jobs and Finance | Digital Inclusion | Transportation | Safety | Amenities | Climate/Earth | Law/Policy/Government. However, not every country GDB contains all 59 data layers, as this was dependent on the availability of open-source data in each SIDS. Users are encouraged to check the accompanying metadata excel file for more information on the datasets in each GDB, the vintage, and the source utilized. | This dataset contains important information and resources. For comprehensive details, documentation, and inquiries, please contact data@worldbank.org. Additional metadata and related resources are available on this page.
Facebook
TwitterThis dataset tracks the updates made on the dataset "Facility Finder Detail Map" as a repository for previous versions of the data and metadata.
Facebook
TwitterData repository for the data used in the thesis of A. Perttu from the experimental PDCs at the PELE facility.
Facebook
TwitterESS-DIVE’s (Environmental Systems Science Data Infrastructure for a Virtual Ecosystem) dataset metadata reporting format is intended to compile information about a dataset (e.g., title, description, funding sources) that can enable reuse of data submitted to the ESS-DIVE data repository. The files contained in this dataset include instructions (dataset_metadata_guide.md and README.md) that can be used to understand the types of metadata ESS-DIVE collects. The data dictionary (dd.csv) follows ESS-DIVE’s file-level metadata reporting format and includes brief descriptions about each element of the dataset metadata reporting format. This dataset also includes a terminology crosswalk (dataset_metadata_crosswalk.csv) that shows how ESS-DIVE’s metadata reporting format maps onto other existing metadata standards and reporting formats. Data contributors to ESS-DIVE can provide this metadata by manual entry using a web form or programmatically via ESS-DIVE’s API (Application Programming Interface). A metadata template (dataset_metadata_template.docx or dataset_metadata_template.pdf) can be used to collaboratively compile metadata before providing it to ESS-DIVE. Since being incorporated into ESS-DIVE’s data submission user interface, ESS-DIVE’s dataset metadata reporting format, has enabled features like automated metadata quality checks, and dissemination of ESS-DIVE datasets onto other data platforms including Google Dataset Search and DataCite.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There is a wide variety of archives, databases, and repositories currently available that pro-vide access to research data. However, basic information about these systems is often diffi-cult to gather, such as whether there are limits to the size of data sets that can be published or whether there is any publication fee that applies. In addition to that, there are plenty of re-search groups publishing their research data sets independently of these infrastructures, making it difficult for scientists to find them since they are not centrally registered. Research data must be easily discoverable and accessible for scientists to use it effectively. The Data Collections Explorer, developed within the national research data infrastructure for the engineering sciences NFDI4Ing, is an easy-to-use information system addressing these needs. It is a low threshold information system that provides an overview of research data repositories, archives, databases as well as individually published data sets. Similar systems exist in other subject areas, for example the Data Repository Finder focusing on the medi-cal, life and social sciences. Contrary to the Data Collections Explorer, the Data Repository Finder only lists repositories. This is the slide set for the talk as part of the "Engineering Sciences" track at the 1st Conference on Research Data Infrastructures.