Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Data Collection and Analysis
We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.
Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.
All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.
Our reproducibility pipeline was started on 27 March 2023.
Repository Structure
Our repository is organized into two main folders:
Accessing Data and Resources:
System Requirements:
Running the pipeline:
Running the analysis:
References:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionResearchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.MethodsWe created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).ResultsThe current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.DiscussionBioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
As per our latest research, the global veterinary master data management market size reached USD 1.24 billion in 2024, reflecting robust demand for digital solutions in animal healthcare. The market is registering a compound annual growth rate (CAGR) of 12.1% and is projected to attain USD 3.48 billion by 2033. This remarkable expansion is fueled by the accelerating adoption of digital records, regulatory mandates for traceability, and the rising complexity of veterinary practices worldwide. The surge in pet ownership, coupled with advancements in veterinary diagnostics and treatments, is driving the need for centralized and accurate data management systems, thus underpinning the market’s strong growth trajectory.
One of the primary growth factors of the veterinary master data management market is the increasing digitization of veterinary healthcare processes. Veterinary practices are increasingly transitioning from manual record-keeping to sophisticated digital platforms that offer real-time access, error reduction, and improved data accuracy. The integration of electronic health records (EHRs) and practice management software has become a standard, enabling seamless sharing of patient information across clinics, laboratories, and pharmacies. With the growing emphasis on evidence-based veterinary medicine, data-driven decision-making is emerging as a crucial aspect, pushing clinics and hospitals to invest in master data management solutions that can harmonize disparate datasets, streamline workflows, and ensure compliance with industry standards.
Another significant driver is the growing regulatory scrutiny and the need for compliance management in the animal health sector. Regulatory bodies across North America, Europe, and Asia Pacific are imposing stringent requirements for the traceability of pharmaceuticals, vaccines, and medical devices used in veterinary care. These regulations necessitate the maintenance of precise and up-to-date data records, compelling veterinary hospitals, research institutes, and pharmacies to adopt robust master data management systems. Furthermore, the increasing threat of zoonotic diseases and the global focus on One Health initiatives are prompting stakeholders to prioritize accurate data capture and reporting, which further accelerates the adoption of advanced data management technologies.
The proliferation of advanced technologies such as artificial intelligence, machine learning, and cloud computing is also transforming the veterinary master data management landscape. Cloud-based solutions are gaining traction due to their scalability, cost-effectiveness, and ability to facilitate remote access to critical data. This is particularly important in the context of multi-site veterinary practices and research collaborations that span geographies. AI-powered analytics are enabling veterinary professionals to derive actionable insights from large datasets, enhancing diagnostic accuracy, treatment outcomes, and operational efficiency. These technological advancements are expanding the functionality and appeal of master data management platforms, making them indispensable tools for modern veterinary institutions.
From a regional perspective, North America continues to dominate the veterinary master data management market, accounting for the largest revenue share in 2024. The region's leadership is underpinned by the presence of a well-developed veterinary infrastructure, high adoption rates of digital technologies, and favorable regulatory frameworks. Europe is also witnessing substantial growth, driven by the increasing focus on animal welfare and the harmonization of veterinary regulations across the European Union. Meanwhile, Asia Pacific is emerging as a high-growth market, fueled by rising pet ownership, expanding veterinary services, and significant investments in digital healthcare infrastructure. Latin America and the Middle East & Africa are gradually catching up, with growing awareness and adoption of data management solutions in animal healthcare settings.
The veterinary master data management market, segmented by component, comprises software and services, each playing a pivotal role in shaping the industry’s evolution. The software segment dominates the market, driven by the increasing need for centralized data repositories and automated workflows within veterinary practices. Modern veterinary master data
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Deze dataset behoort bij het rapport "Project vernieuwing open access monitoring - Rapportage fase 1 - peer-reviewed artikelen" (https://doi.org/10.5281/zenodo.15061685). De uitgangspunten van het project waren het opzetten van een transparante en reproduceerbare workflow voor open access monitoring van peer-reviewed artikelen van de Nederlandse universiteiten, gebruikmakend van open data en met code en data die geheel open gedeeld kunnen worden.
De dataset omvat record-level informatie over peer-reviewed artikelen van de Nederlandse universiteiten van publicatiejaar 2023, zoals aangeleverd door de instellingen vanuit hun CRIS-systemen. De data zijn aangevuld met bibliografische informatie uit Crossref, DOAJ, de ISSN registry, en Unpaywall.
In totaal zijn 50.115 unieke DOIs meegenomen in de analyse (dit is inclusief publicaties van de Universiteit van Humanistiek). Hiervan kon van 49.815 publicaties de OA-status vastgesteld worden.
Behalve informatie over Open Access types bevat de dataset ook informatie over:
Noot: De resultaten van deze centrale monitoring laten verschillen met de bestaande decentrale monitoring zien. Met name het aandeel OA via repositories is lager. Deels kunnen de verschillen worden verklaard uit de set artikelen die is gebruikt, deels uit de manier waarop de OA status is vastgesteld. Een uitgebreide bespreking van de verschillen tussen de bestaande decentrale monitoring en deze centrale monitoring is terug te vinden in paragraaf 4.1. van de projectrapportage.
-----------------------------------------
This dataset is associated with the report "Project Vernieuwing Open Access Monitoring - Report Phase 1 - Peer-Reviewed Articles" [in Dutch] (https://doi.org/10.5281/zenodo.15061685). The project's objectives were to establish a transparent and reproducible workflow for centralized open access monitoring of peer-reviewed articles of the Dutch universities, utilizing open data, with code and data that can be fully shared openly.
The dataset contains record-level information on peer-reviewed articles from Dutch universities for publication year 2023, as provided by the institutions from their CRIS systems. The data has been supplemented with bibliographic information from Crossref, DOAJ, the ISSN registry, and Unpaywall.
In total, 50,115 unique DOIs were included in the analysis (including publications from the University of Humanistic Studies). The OA status of 49,815 publications was determined.
In addition to information on Open Access types, the dataset also includes details on:
Note: The results of this central monitoring show differences compared to the existing decentralized monitoring. In particular, the share of OA via repositories is lower. Some of the differences can be explained by the set of articles used and the way in which OA status was determined. A detailed discussion of the differences between the existing decentralized monitoring and this central monitoring can be found in section 4.1 of the project report.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data composition and characteristics of included studies.
Facebook
TwitterLand cover is a key variable in the context of climate change. In particular, crop type information is essential to understand the spatial distribution of water usage and anticipate the risk of water scarcity and the consequent danger of food insecurity. This applies to arid regions such as the Aral Sea Basin (ASB), Central Asia, where agriculture relies heavily on irrigation. Here, remote sensing is valuable to map crop types, but its quality depends on consistent ground-truth data. Yet, in the ASB, such data is missing. Addressing this issue, we collected thousands of polygons on crop types, 97.7% of which in Uzbekistan and the remaining in Tajikistan. We collected 8,196 samples between 2015 and 2018, 213 in 2011 and 26 in 2008. Our data compiles samples for 40 crop types and is dominated by “cotton” (40%) and “wheat”, (25%). These data were meticulously validated using expert knowledge and remote sensing data and relied on transferable, open-source workflows that will assure the consistency of future sampling campaigns.
Facebook
Twitterhomeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.csv
refers to: [project name][study area][processing algorithm][methodology][resolution][sensor or satellite mission][track/orbit descriptor][first qcquisition][last acquisition]_ref[reference acquisition date].[file extension]
Notes:- All dates use the format YYYYMM for start and end, and YYYYMMDD for the reference date. - Underscores are used consistently to separate fields. - The naming convention is consistent across all data formats.- Decomposed 3D components include the descriptor of the resampling method used (grid: grid-based, nnv: nearest neighbor vector approach (Foumelis, 2016) and the number of the grid resolution in meters.
The full dataset contains the following files:
metadata_dictionary.csv: Descriptions of metadata fields. Ascending and Descending Line-of-Sight (LOS) products: Each geometry folder contains CSV files with displacement time series, GeoTIFF rasters representing mean velocity and coherence, and shapefile layers for geospatial visualization and corresponding metadata JSON files. 3D Decomposed Components (E-W and Up): Outputs from the grid-based and nearest-neighbor vector decomposition methods, provided at 50m and 100m spatial resolutions. Each subfolder contains displacement rasters and CSV tables for the respective 3D components, together with detailed metadata.
Total size: ~240 MB (compressed) / ~933 MB (uncompressed) 3. Dataset structure Central_Ionian_dataset/│├── README.md├── LICENSE.txt│├── ascending/│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.csv│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.tiff│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108_coh.tiff│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.shp│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.dbf│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.prj│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.shx│ └── metadata_ascending.json│├── descending/│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.csv│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.tiff│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102_coh.tiff│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.shp│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.dbf│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.prj│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.shx│ └── metadata_descending.json│└── 3d_components/ ├── GRID/ │ ├── 50/ │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_ew.csv │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_up.csv │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_ew.tiff │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_up.tiff │ │ └── metadata_grid_50.json │ │ │ └── 100/ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_ew.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_up.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_ew.tiff │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_up.tiff │ └── metadata_grid_100.json │ └── NNV/ ├── 50/ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_ew.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_up.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_ew.tiff │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_up.tiff │ └── metadata_nnv_50.json │ └── 100/ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_ew.csv ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_up.csv ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_ew.tiff ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_up.tiff └── metadata_nnv_100.json 4. How to cite
Foumelis, M., Papageorgiou, E., Bonatis, P. (2025). Ground Deformation Dataset for Geohazard Monitoring in the Central Ionian Islands (Greece) within the HOMEROS Project [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17253643
Foumelis M. (2016). Vector-based approach for combining ascending and descending persistent scatterers interferometric point measurements. Geocarto International,33(1), 3852. https://doi.org/10.1080/10106049.2016.1222636
Foumelis M, Delgado Blasco JM, Brito F, Pacini F, Papageorgiou E, Pishehvar P, Bally P. (2022). SNAPPING Services on the Geohazards Exploitation Platform for Copernicus Sentinel-1 Surface Motion Mapping.Remote Sensing, 14(23), 6075. https://doi.org/10.3390/rs14236075
Facebook
TwitterThis dataset contains data obtained from the HDX Humanitarian API (HDX HAPI), which provides standardized humanitarian indicators designed for seamless interoperability from multiple sources. The data facilitates automated workflows and visualizations to support humanitarian decision making. For more information, please see the HDX HAPI landing page and documentation.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Zebrafish have emerged as a central model organism in toxicological research. Zebrafish embryos are exempt from certain animal testing regulations, which facilitates their use in toxicological testing. Next to the zebrafish embryo acute toxicity test (ZFET) according to the OECD TG 236, fish embryos are used in mechanistic investigations, chemical screenings, ecotoxicology, and drug development. However, inconsistencies in the applied test protocols and the monitored endpoints in addition to a lack of standardized data formats impede comprehensive meta-analyses and cross-study comparisons. To address these challenges, we developed the Integrated Effect Database for Toxicological Observations (INTOB), a comprehensive data management tool that standardizes the collection of metadata and phenotypic observations using a controlled vocabulary. By incorporating data from more than 600 experiments into the database and subsequent comprehensive data analyses, we demonstrate its utility in improving the comparability and interoperability of toxicity data. Our results show that the ZFET can detect toxicity spanning 7 orders of magnitude at the scale of effect concentrations. We also highlight the potential of read-across analyses based on morphological fingerprints and their connection to chemical modes of action, provide information on control variability of the ZFET, and highlight the importance of time for mechanistic understanding in chemical exposure-effect assessments. We provide the full Findable, Accessible, Interoperable, and Reusable (FAIR) data set as well as the analysis workflow and demonstrate how professional data management, as enabled with INTOB, marks a significant advancement by offering a comprehensive framework for the systematic use of zebrafish embryo toxicity data, thus paving the way for more reliable, data-driven chemical risk assessment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NOMAD poster provides a comprehensive overview of two powerful tools, NOMAD and NOMAD Oasis, designed to revolutionize the landscape of scientific data research. Developed by the FAIRmat consortium, NOMAD and NOMAD Oasis promote the principles of Findable, Accessible, Interoperable, and Reusable(FAIR) data, enhancing transparency and collaboration in materials science. NOMAD is a centralized material science data repository addressing the needs of researchers from various domains and fields. It seamlessly manages diverse data formats, including raw files, facilitating efficient data analysis and exploration. The posterhighlights the key features and use cases of NOMAD and NOMAD Oasis while thoroughly exploring their capabilities.The project outlines the data workflow within the NOMAD ecosystem, illustrating how uploaded data is transformed into processed and modeled data. Subsequently, it offers three options for utilizing the parsed data: publish, analyze, and explore. Researchers can leverage NOMAD's data publishing capabilities toshare their findings with the scientific community, obtain a DOI, and support open collaboration. By promoting FAIR data research, the NOMAD poster underscores the importance of data integrity, accessibility, and interoperability. It is a valuable resource for scientists, data centers, and scientific organizations, showcasingthe potential of NOMAD and NOMAD Oasis in everyday workflow.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Used github commits when running Nextflow workflows.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Data Collection and Analysis
We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.
Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.
All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.
Our reproducibility pipeline was started on 27 March 2023.
Repository Structure
Our repository is organized into two main folders:
Accessing Data and Resources:
System Requirements:
Running the pipeline:
Running the analysis:
References: