The journals’ author guidelines and/or editorial policies were examined on whether they take a stance with regard to the availability of the underlying data of the submitted article. The mere explicated possibility of providing supplementary material along with the submitted article was not considered as a research data policy in the present study. Furthermore, the present article excluded source codes or algorithms from the scope of the paper and thus policies related to them are not included in the analysis of the present article.
For selection of journals within the field of neurosciences, Clarivate Analytics’ InCites Journal Citation Reports database was searched using categories of neurosciences and neuroimaging. From the results, journals with the 40 highest Impact Factor (for the year 2017) indicators were extracted for scrutiny of research data policies. Respectively, the selection journals within the field of physics was created by performing a similar search with the categories of physics, applied; physics, atomic, molecular & chemical; physics, condensed matter; physics, fluids & plasmas; physics, mathematical; physics, multidisciplinary; physics, nuclear and physics, particles & fields. From the results, journals with the 40 highest Impact Factor indicators were again extracted for scrutiny. Similarly, the 40 journals representing the field of operations research were extracted by using the search category of operations research and management.
Journal-specific data policies were sought from journal specific websites providing journal specific author guidelines or editorial policies. Within the present study, the examination of journal data policies was done in May 2019. The primary data source was journal-specific author guidelines. If journal guidelines explicitly linked to the publisher’s general policy with regard to research data, these were used in the analyses of the present article. If journal-specific research data policy, or lack of, was inconsistent with the publisher’s general policies, the journal-specific policies and guidelines were prioritized and used in the present article’s data. If journals’ author guidelines were not openly available online due to, e.g., accepting submissions on an invite-only basis, the journal was not included in the data of the present article. Also journals that exclusively publish review articles were excluded and replaced with the journal having the next highest Impact Factor indicator so that each set representing the three field of sciences consisted of 40 journals. The final data thus consisted of 120 journals in total.
‘Public deposition’ refers to a scenario where researcher deposits data to a public repository and thus gives the administrative role of the data to the receiving repository. ‘Scientific sharing’ refers to a scenario where researcher administers his or her data locally and by request provides it to interested reader. Note that none of the journals examined in the present article required that all data types underlying a submitted work should be deposited into a public data repositories. However, some journals required public deposition of data of specific types. Within the journal research data policies examined in the present article, these data types are well presented by the Springer Nature policy on “Availability of data, materials, code and protocols” (Springer Nature, 2018), that is, DNA and RNA data; protein sequences and DNA and RNA sequencing data; genetic polymorphisms data; linked phenotype and genotype data; gene expression microarray data; proteomics data; macromolecular structures and crystallographic data for small molecules. Furthermore, the registration of clinical trials in a public repository was also considered as a data type in this study. The term specific data types used in the custom coding framework of the present study thus refers to both life sciences data and public registration of clinical trials. These data types have community-endorsed public repositories where deposition was most often mandated within the journals’ research data policies.
The term ‘location’ refers to whether the journal’s data policy provides suggestions or requirements for the repositories or services used to share the underlying data of the submitted works. A mere general reference to ‘public repositories’ was not considered a location suggestion, but only references to individual repositories and services. The category of ‘immediate release of data’ examines whether the journals’ research data policy addresses the timing of publication of the underlying data of submitted works. Note that even though the journals may only encourage public deposition of the data, the editorial processes could be set up so that it leads to either publication of the research data or the research data metadata in conjunction to publishing of the submitted work.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
The data life cycle from experiments to scientific publications follows in general the schema: experiments, data analysis, interpretation, and publication of scientific paper. Beside the publication of scientific findings, it is important to keep the data investment and ensure its future processing. This implies a guarantee for a long-term preservation and preventing of data loss. Condensed and enriched with metadata, primary data would be a more valuable resource than the re-extraction from articles. In this context it becomes essential, to change the handling and the acceptance of primary data within the scientific community. Data and publications should be honored with a high attention and reputation for data publishers. Here, we present new features of the e!DAL Java API (http://edal.ipk-gatersleben.de) as a lightweight software framework for publishing and sharing of research data. Its main features are version tracking, management of metadata, information retrieval, registration of persistent identifier, embedded HTTP(S) server for public data access, access as network file system, and a scalable storage backend. e!DAL is available as an open-source API for a local nonshared storage and remote usage to feature distributed applications. IPK is an approved data center in the international DataCite consortium (http://www.datacite.org/) and applies e!DAL as data submission and registration system. In the latest version the focus was to extend the features for the registration of Digital Object Identifier (DOI) and the development of a simple, but sufficient approval process to regulate the assignment of persistent identifier. In addition we implement some new graphical components, like an easy installation/demo wizard, to simplify the establishment of a repositories using e!DAL. An intuitive publication tool (Figure 1), allows uploading your data into your own private repository over the web and getting a DOI to permanently reference the datasets and increase your “data citation” index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study identifies citation predictors of Russian scientific publications on psychology in the Web of Science Core Collection (WoS) and compares them with citation predictors in the Russian Science Citation Index (RSCI). Four groups of indicators of various types are considered: formal attributes of the article (12 parameters), article visibility parameters on the eLibrary (3 parameters) and PsyJournals (2 parameters) internet portals, showing the availability of the article text to potential readers, and attributes of the author’s method of scientific citation (3 parameters). Special attention is paid to citation attributes as qualitative characteristics of the author’s way of working on his own scientific text and constructing a dialogue (in the form of citation) with other researchers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Purpose. Scientific journals in open access play an important role in scientific communications. It is they are able to provide an operational studying the leading publications and their scientific substantiation in the form of reference to the works of authors that were used in the research. The achievement of high citation index is an important component of scientific activity for the modern scientist. The purpose of the article is to determine the tools to increase the citation index of scientists. Methodology. The methods of generalization, analysis and synthesis, as well as a systematic approach were used in the research.Findings. The university library has an important mission - to distribute the results of research activity of the university scientists. As part of this functional activity area of the scientific and technical library of DNURT a repository of scientific papers, a system of open access for scientific journals and online versions of proceedings were organized. These resources provide the opportunity for a wide range of scientists to study the results of research carried out by their colleagues in DNURT and to cite them in their own articles. During the scientometric research the library staff use the following information platforms: Google Scholar, SciVerse Scopus, DOAJ, Russian Science Citation Index, SCImago Journal & Country Rank. Originality. The work originality is the determination of the ways to influence the formation of the high citation index for scientist. Practical value. The article proves the feasibility of using the open access resources (electronic journals, proceedings and the institutional repositories) to gain the scientist popularity in the professional scientific community.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Seatizen Atlas image dataset
This repository contains the resources and tools for accessing and utilizing the annotated images within the Seatizen Atlas dataset, as described in the paper Seatizen Atlas: a collaborative dataset of underwater and aerial marine imagery.
Download the Dataset
This annotated dataset is part of a bigger dataset composed of labeled and unlabeled images. To access information about the whole dataset, please visit the Zenodo repository and follow the download instructions provided.
Scientific Publication
If you use this dataset in your research, please consider citing the associated paper:
@article{Contini2025, author = {Matteo Contini and Victor Illien and Mohan Julien and Mervyn Ravitchandirane and Victor Russias and Arthur Lazennec and Thomas Chevrier and Cam Ly Rintz and Léanne Carpentier and Pierre Gogendeau and César Leblanc and Serge Bernard and Alexandre Boyer and Justine Talpaert Daudon and Sylvain Poulain and Julien Barde and Alexis Joly and Sylvain Bonhommeau}, doi = {10.1038/s41597-024-04267-z}, issn = {2052-4463}, issue = {1}, journal = {Scientific Data}, pages = {67}, title = {Seatizen Atlas: a collaborative dataset of underwater and aerial marine imagery}, volume = {12}, url = {https://doi.org/10.1038/s41597-024-04267-z}, year = {2025},}
For detailed information about the dataset and experimental results, please refer to the previous paper.
Overview
The Seatizen Atlas dataset includes 14,492 multilabel and 1,200 instance segmentation annotated images. These images are useful for training and evaluating AI models for marine biodiversity research. The annotations follow standards from the Global Coral Reef Monitoring Network (GCRMN).
Annotation Details
Annotation Types:
Multilabel Convention: Identifies all observed classes in an image.
Instance Segmentation: Highlights contours of each instance for each class.
List of Classes
Algae
Algal Assemblage
Algae Halimeda
Algae Coralline
Algae Turf
Coral
Acropora Branching
Acropora Digitate
Acropora Submassive
Acropora Tabular
Bleached Coral
Dead Coral
Gorgonian
Living Coral
Non-acropora Millepora
Non-acropora Branching
Non-acropora Encrusting
Non-acropora Foliose
Non-acropora Massive
Non-acropora Coral Free
Non-acropora Submassive
Seagrass
Syringodium Isoetifolium
Thalassodendron Ciliatum
Habitat
Rock
Rubble
Sand
Other Organisms
Thorny Starfish
Sea Anemone
Ascidians
Giant Clam
Fish
Other Starfish
Sea Cucumber
Sea Urchin
Sponges
Turtle
Custom Classes
Blurred
Homo Sapiens
Human Object
Trample
Useless
Waste
These classes reflect the biodiversity and variety of habitats captured in the Seatizen Atlas dataset, providing valuable resources for training AI models in marine biodiversity research.
Usage Notes
The annotated images are available for non-commercial use. Users are requested to cite the related publication in any resulting works. A GitHub repository has been set up to facilitate data reuse and sharing: GitHub Repository.
Code Availability
All related codes for data processing, downloading, and AI model training can be found in the following GitHub repositories:
Plancha Workflow
Zenodo Tools
DinoVdeau Model
Acknowledgements
This dataset and associated research have been supported by several organizations, including the Seychelles Islands Foundation, Réserve Naturelle Marine de la Réunion, and Monaco Explorations, among others.
For any questions or collaboration inquiries, please contact seatizen.ifremer@gmail.com.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data release contains the ten thousand neuron recordings used in Stringer, Pachitariu et al, 2018a. The code to make the figures in the paper will be available at https://github.com/MouseLand/stringer-pachitariu-et-al-2018aWe encourage data users to fork this repository, or create their own repository inside MouseLand, where we will also be adding our future data and analyses. "Watching" the repository might be a good idea, since any new information about the data, analyses of the data, or publications using it, will appear there. Some potential projects to do with this data:1) peer prediction: how well can you predict a neuron from the other 10,000? can you beat our score?2) face prediction: how well can you predict a neuron from the behavioral patterns on the face videos? 3) manifold discovery: can you find a nonlinear low-dimensional embedding? how low can it go? If you use these data in a paper, please cite the original research paper, as well as this dataset using the figshare doi.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains one of the main outputs of a study on international migration among published researchers from and to the United Kingdom. The migration flows are inferred from the changes of affiliation addresses in Scopus publications from 1996-2020. Scopus data is owned and maintained by Elsevier.This dataset is provided under a CC BY-NC-SA Creative Commons v 4.0 license (Attribution-NonCommercial-ShareAlike). This means that other individuals may remix, tweak, and build upon these data non-commercially, as long as they provide citations to this data repository (10.6084/m9.figshare.14207369) and the reference article listed below, and license the new creations under the identical terms.For more details about the study, please refer to Sanliturk et al. (2021).The dataset is provided in a comma-separated values file (.csv file) and each row represents the migration flow of research-active scholars from a country to another country in a specific year. Either the origin country or the destination country is the United Kingdom.The data can be used to produce migration models or possibly other measures and estimates. They can also be used as an edge list for creating a network model of migration flows (directed weighted edges) between the UK and other countries (nodes).
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
🌟 Introduction
This repository provides the data used in the research by Puliti and Astrup (2022) Automatic detection of snow breakage at single tree level using YOLOv5 applied to UAV imagery. International Journal of Applied Earth Observation and Geoinformation, 112, p.102946.
🌲 Scope of the Data
This dataset is intended for:🔍 Development and benchmarking of object detection models for individual trees and classification of trees based on their health.
Data is provided in the YOLO format with bounding box labels 📦🌲
🖥️ Existing Code and Model
The code for model inference, as described in the paper by Puliti and Astrup (2022), is available in the following GitHub repository:
🔗 GitHub Repository for Model Inference
This repository includes:
Inference Scripts: Scripts to apply the trained YOLOv5 model for detecting snow breakage at the single-tree level. 🌲
Pre-trained Models: Downloadable weights for reproducing results from the publication.
Example Workflows: Step-by-step guidance for running the model on your own UAV imagery. 🚁
Make sure to follow the repository’s documentation for setup instructions, dependencies, and usage examples. 💻
📜 Citation
If you use this dataset, please give credit by citing the original paper:
@article{PULITI2022102946,title = {Automatic detection of snow breakage at single tree level using YOLOv5 applied to UAV imagery},journal = {International Journal of Applied Earth Observation and Geoinformation},volume = {112},pages = {102946},year = {2022},issn = {1569-8432},doi = {https://doi.org/10.1016/j.jag.2022.102946},url = {https://www.sciencedirect.com/science/article/pii/S1569843222001431},author = {Stefano Puliti and Rasmus Astrup},keywords = {Forest damage, Convolutional neural network, Deep-learning, Drones, Object detection}}
⚖️ Licensing
📄 Please refer to the specific licenses below for details on how the data can be used.
🔑 Key Licensing Principles:
✅ You may access, use, and share the dataset and models freely.
🔄 Any derivative works (e.g., trained models, code for training, or prediction tools) must also be made publicly available under the same licensing terms.
🌍 These licenses promote collaboration and transparency, ensuring that research using this dataset benefits the broader scientific and open-source community 🙌
This repository contains data on 17,420 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI). References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query. We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard. A brief descriptive analysis was provided as a blogpost on the COKI website. The repository contains the following content: Data: data/scholarcy/RIS/ - extracted references as RIS files data/scholarcy/BibTeX/ - extracted references as BibTeX files IPCC_AR6_WGII_dois.csv - list of DOIs Processing: preprocessing.txt - preprocessing steps for identifying and cleaning DOIs process.py - Python script for transforming data and linking to COKI data through Google Big Query Outcomes: Dataset on BigQuery - requires a google account for access and bigquery account for querying Data Studio Dashboard - interactive analysis of the generated data Zotero library of references extracted via Scholarcy PDF version of blogpost Note on licenses: Data are made available under CC0 Code is made available under Apache License 2.0 Archived version of Release 2022-03-04 of GitHub repository: https://github.com/Curtin-Open-Knowledge-Initiative/ipcc-ar6
#############
#############
Authors: Valentin Gabeff, Marc Russwurm, Devis Tuia & Alexander Mathis
Affiliation: EPFL
Date: January, 2024
Link to the article: https://link.springer.com/article/10.1007/s11263-024-02026-6
--------------------------------
WildCLIP is a fine-tuned CLIP model that allows to retrieve camera-trap events with natural language from the Snapshot Serengeti dataset. This project intends to demonstrate how vision-language models may assist the annotation process of camera-trap datasets.
Here we provide the processed Snapshot Serengeti data used to train and evaluate WildCLIP, along with two versions of WildCLIP (model weights).
Details on how to run these models can be found in the project github repository.
The data consists of 380 x 380 image crops corresponding to the MegaDetector output of Snapshot Serengeti with a confidence threshold above 0.7. We considered only camera trap images containing single individuals.
A description of the original data can be found on LILA here, released under the Community Data License Agreement (permissive variant).
We warmly thank the authors of LILA for making the MegaDetector outputs publicly available, as well as for structuring the dataset and facilitating its access.
WildCLIP models provided:
We also provide the CSV files containing the train / val / test splits. The train / test splits follow camera split from LILA (https://lila.science/datasets/snapshot-serengeti). The validation split is custom, and also at the camera level.
Details on how the models were trained can be found in the associated publication.
If you find our code, or weights, please cite:
@article{gabeff2024wildclip, title={WildCLIP: Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models}, author={Gabeff, Valentin and Ru{\ss}wurm, Marc and Tuia, Devis and Mathis, Alexander}, journal={International Journal of Computer Vision}, pages={1--17}, year={2024}, publisher={Springer} }
If you use the adapted Snapshot Serengeti data please also cite their article:
@article{swanson2015snapshot, title={Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna}, author={Swanson, Alexandra and Kosmala, Margaret and Lintott, Chris and Simpson, Robert and Smith, Arfon and Packer, Craig}, journal={Scientific data}, volume={2}, number={1}, pages={1--14}, year={2015}, publisher={Nature Publishing Group} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This directory contains the data used in the article **Zwaan et al. (2024) Widespread forest-savanna coexistence but limited bistability at a landscape scale in Central Africa** (DOI: https://doi.org/10.1088/1748-9326/ad8cef).
This repository provides the datasets used for the figures and statistics in our research article. We hope that this data can be useful for other researchers. Feel free to explore and analyze the data as needed. If you use this data in your own research, please cite our article.
**Contact**: For any questions or further information, please contact:
* Aart Zwaan
* Email: a.zwaan@uu.nl
https://creativecommons.org/licenses/publicdomain/https://creativecommons.org/licenses/publicdomain/
This repository contains data on 17,419 DOIs cited in the IPCC Working Group 2 contribution to the Sixth Assessment Report, and the code to link them to the dataset built at the Curtin Open Knowledge Initiative (COKI).
References were extracted from the report's PDFs (downloaded 2022-03-01) via Scholarcy and exported as RIS and BibTeX files. DOI strings were identified from RIS files by pattern matching and saved as CSV file. The list of DOIs for each chapter and cross chapter paper was processed using a custom Python script to generate a pandas DataFrame which was saved as CSV file and uploaded to Google Big Query.
We used the main object table of the Academic Observatory, which combines information from Crossref, Unpaywall, Microsoft Academic, Open Citations, the Research Organization Registry and Geonames to enrich the DOIs with bibliographic information, affiliations, and open access status. A custom query was used to join and format the data and the resulting table was visualised in a Google DataStudio dashboard.
This version of the repository also includes the set of DOIs from references in the IPCC Working Group 1 contribution to the Sixth Assessment Report as extracted by Alexis-Michel Mugabushaka and shared on Zenodo: https://doi.org/10.5281/zenodo.5475442 (CC-BY)
A brief descriptive analysis was provided as a blogpost on the COKI website.
The repository contains the following content:
Data:
data/scholarcy/RIS/ - extracted references as RIS files
data/scholarcy/BibTeX/ - extracted references as BibTeX files
IPCC_AR6_WGII_dois.csv - list of DOIs
data/10.5281_zenodo.5475442/ - references from IPCC AR6 WG1 report
Processing:
preprocessing.R - preprocessing steps for identifying and cleaning DOIs
process.py - Python script for transforming data and linking to COKI data through Google Big Query
Outcomes:
Dataset on BigQuery - requires a google account for access and bigquery account for querying
Data Studio Dashboard - interactive analysis of the generated data
Zotero library of references extracted via Scholarcy
PDF version of blogpost
Note on licenses: Data are made available under CC0 (with the exception of WG1 reference data, which have been shared under CC-BY 4.0) Code is made available under Apache License 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
In this repository, we provide:
66 Full HD video clips (total size: 5.5 GB)
126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)
3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):
annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood
annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.
annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.
The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:
Test set: All the images whose filename starts with "DJI_0804" (total: 37,604 images)
Training set: All the images whose filename starts with "DJI_0915" (total: 88,568 images)
More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973 The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval See also http://aimh.isti.cnr.it/dataset/MOBDrone
Citing the MOBDrone
The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form. Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people
@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }
and this Zenodo Dataset
@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }
Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.
Contact Information
If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it
Acknowledgements
This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If using this dataset, please cite the following paper and the current Zenodo repository.
This dataset is described in detail in the following paper:
[1] Yao, Y., Stebner, A., Tuytelaars, T., Geirnaert, S., & Bertrand, A. (2024). Identifying temporal correlations between natural single-shot videos and EEG signals. Journal of Neural Engineering, 21(1), 016018. doi:10.1088/1741-2552/ad2333
The associated code is available at: https://github.com/YYao-42/Identifying-Temporal-Correlations-Between-Natural-Single-shot-Videos-and-EEG-Signals?tab=readme-ov-file
Introduction
The research work leading to this dataset was conducted at the Department of Electrical Engineering (ESAT), KU Leuven.
This dataset contains electroencephalogram (EEG) data collected from 19 young participants with normal or corrected-to-normal eyesight when they were watching a series of carefully selected YouTube videos. The videos were muted to avoid the confounds introduced by audio. For synchronization, a square box was encoded outside of the original frames and flashed every 30 seconds in the top right corner of the screen. A photosensor, detecting the light changes from this flashing box, was affixed to that region using black tape to ensure that the box did not distract participants. The EEG data was recorded using a BioSemi ActiveTwo system at a sample rate of 2048 Hz. Participants wore a 64-channel EEG cap, and 4 electrooculogram (EOG) sensors were positioned around the eyes to track eye movements.
The dataset includes a total of (19 subjects x 63 min + 9 subjects x 24 min) of data. Further details can be found in the following section.
Content
YouTube Videos: Due to copyright constraints, the dataset includes links to the original YouTube videos along with precise timestamps for the segments used in the experiments. The features proposed in 1 have been extracted and can be downloaded here: https://drive.google.com/file/d/1J1tYrxVizrl1xP-W1imvlA_v-DPzZ2Qh/view?usp=sharing.
Raw EEG Data: Organized by subject ID, the dataset contains EEG segments corresponding to the presented videos. Both EEGLAB .set files (containing metadata) and .fdt files (containing raw data) are provided, which can also be read by popular EEG analysis Python packages such as MNE.
The naming convention links each EEG segment to its corresponding video. E.g., the EEG segment 01_eeg corresponds to video 01_Dance_1, 03_eeg corresponds to video 03_Acrob_1, Mr_eeg corresponds to video Mr_Bean, etc.
The raw data have 68 channels. The first 64 channels are EEG data, and the last 4 channels are EOG data. The position coordinates of the standard BioSemi headcaps can be downloaded here: https://www.biosemi.com/download/Cap_coords_all.xls.
Due to minor synchronization ambiguities, different clocks in the PC and EEG recorder, and missing or extra video frames during video playback (rarely occurred), the length of the EEG data may not perfectly match the corresponding video data. The difference, typically within a few milliseconds, can be resolved by truncating the modality with the excess samples.
Signal Quality Information: A supplementary .txt file detailing potential bad channels. Users can opt to create their own criteria for identifying and handling bad channels.
The dataset is divided into two subsets: Single-shot and MrBean, based on the characteristics of the video stimuli.
Single-shot Dataset
The stimuli of this dataset consist of 13 single-shot videos (63 min in total), each depicting a single individual engaging in various activities such as dancing, mime, acrobatics, and magic shows. All the participants watched this video collection.
Video ID Link Start time (s) End time (s)
01_Dance_1 https://youtu.be/uOUVE5rGmhM 8.54 231.20
03_Acrob_1 https://youtu.be/DjihbYg6F2Y 4.24 231.91
04_Magic_1 https://youtu.be/CvzMqIQLiXE 3.68 348.17
05_Dance_2 https://youtu.be/f4DZp0OEkK4 5.05 227.99
06_Mime_2 https://youtu.be/u9wJUTnBdrs 5.79 347.05
07_Acrob_2 https://youtu.be/kRqdxGPLajs 183.61 519.27
08_Magic_2 https://youtu.be/FUv-Q6EgEFI 3.36 270.62
09_Dance_3 https://youtu.be/LXO-jKksQkM 5.61 294.17
12_Magic_3 https://youtu.be/S84AoWdTq3E 1.76 426.36
13_Dance_4 https://youtu.be/0wc60tA1klw 14.28 217.18
14_Mime_3 https://youtu.be/0Ala3ypPM3M 21.87 386.84
15_Dance_5 https://youtu.be/mg6-SnUl0A0 15.14 233.85
16_Mime_6 https://youtu.be/8V7rhAJF6Gc 31.64 388.61
MrBean Dataset
Additionally, 9 participants watched an extra 24-minute clip from the first episode of Mr. Bean, where multiple (moving) objects may exist and interact, and the camera viewpoint may change. The subject IDs and the signal quality files are inherited from the single-shot dataset.
Video ID Link Start time (s) End time (s)
Mr_Bean https://www.youtube.com/watch?v=7Im2I6STbms 39.77 1495.00
Acknowledgement
This research is funded by the Research Foundation - Flanders (FWO) project No G081722N, junior postdoctoral fellowship fundamental research of the FWO (for S. Geirnaert, No. 1242524N), the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 802895), the Flemish Government (AI Research Program), and the PDM mandate from KU Leuven (for S. Geirnaert, No PDMT1/22/009).
We also thank the participants for their time and effort in the experiments.
Contact Information
Executive researcher: Yuanyuan Yao, yuanyuan.yao@kuleuven.be
Led by: Prof. Alexander Bertrand, alexander.bertrand@kuleuven.be
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The journals’ author guidelines and/or editorial policies were examined on whether they take a stance with regard to the availability of the underlying data of the submitted article. The mere explicated possibility of providing supplementary material along with the submitted article was not considered as a research data policy in the present study. Furthermore, the present article excluded source codes or algorithms from the scope of the paper and thus policies related to them are not included in the analysis of the present article.
For selection of journals within the field of neurosciences, Clarivate Analytics’ InCites Journal Citation Reports database was searched using categories of neurosciences and neuroimaging. From the results, journals with the 40 highest Impact Factor (for the year 2017) indicators were extracted for scrutiny of research data policies. Respectively, the selection journals within the field of physics was created by performing a similar search with the categories of physics, applied; physics, atomic, molecular & chemical; physics, condensed matter; physics, fluids & plasmas; physics, mathematical; physics, multidisciplinary; physics, nuclear and physics, particles & fields. From the results, journals with the 40 highest Impact Factor indicators were again extracted for scrutiny. Similarly, the 40 journals representing the field of operations research were extracted by using the search category of operations research and management.
Journal-specific data policies were sought from journal specific websites providing journal specific author guidelines or editorial policies. Within the present study, the examination of journal data policies was done in May 2019. The primary data source was journal-specific author guidelines. If journal guidelines explicitly linked to the publisher’s general policy with regard to research data, these were used in the analyses of the present article. If journal-specific research data policy, or lack of, was inconsistent with the publisher’s general policies, the journal-specific policies and guidelines were prioritized and used in the present article’s data. If journals’ author guidelines were not openly available online due to, e.g., accepting submissions on an invite-only basis, the journal was not included in the data of the present article. Also journals that exclusively publish review articles were excluded and replaced with the journal having the next highest Impact Factor indicator so that each set representing the three field of sciences consisted of 40 journals. The final data thus consisted of 120 journals in total.
‘Public deposition’ refers to a scenario where researcher deposits data to a public repository and thus gives the administrative role of the data to the receiving repository. ‘Scientific sharing’ refers to a scenario where researcher administers his or her data locally and by request provides it to interested reader. Note that none of the journals examined in the present article required that all data types underlying a submitted work should be deposited into a public data repositories. However, some journals required public deposition of data of specific types. Within the journal research data policies examined in the present article, these data types are well presented by the Springer Nature policy on “Availability of data, materials, code and protocols” (Springer Nature, 2018), that is, DNA and RNA data; protein sequences and DNA and RNA sequencing data; genetic polymorphisms data; linked phenotype and genotype data; gene expression microarray data; proteomics data; macromolecular structures and crystallographic data for small molecules. Furthermore, the registration of clinical trials in a public repository was also considered as a data type in this study. The term specific data types used in the custom coding framework of the present study thus refers to both life sciences data and public registration of clinical trials. These data types have community-endorsed public repositories where deposition was most often mandated within the journals’ research data policies.
The term ‘location’ refers to whether the journal’s data policy provides suggestions or requirements for the repositories or services used to share the underlying data of the submitted works. A mere general reference to ‘public repositories’ was not considered a location suggestion, but only references to individual repositories and services. The category of ‘immediate release of data’ examines whether the journals’ research data policy addresses the timing of publication of the underlying data of submitted works. Note that even though the journals may only encourage public deposition of the data, the editorial processes could be set up so that it leads to either publication of the research data or the research data metadata in conjunction to publishing of the submitted work.