11 datasets found
  1. Dataset of a Study of Computational reproducibility of Jupyter notebooks...

    • zenodo.org
    pdf, zip
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen (2024). Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications [Dataset]. http://doi.org/10.5281/zenodo.8226725
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

    Data Collection and Analysis

    We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.

    Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.

    All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.

    Our reproducibility pipeline was started on 27 March 2023.

    Repository Structure

    Our repository is organized into two main folders:

    • archaeology: This directory hosts scripts designed to download, parse, and extract metadata from PubMed Central publications and associated repositories. There are 24 database tables created which store the information on articles, journals, authors, repositories, notebooks, cells, modules, executions, etc. in the db.sqlite database file.
    • analyses: Here, you will find notebooks instrumental in the in-depth analysis of data related to our study. The db.sqlite file generated by running the archaelogy folder is stored in the analyses folder for further analysis. The path can however be configured in the config.py file. There are two sets of notebooks: one set (naming pattern N[0-9]*.ipynb) is focused on examining data pertaining to repositories and notebooks, while the other set (PMC[0-9]*.ipynb) is for analyzing data associated with publications in PubMed Central, i.e.\ for plots involving data about articles, journals, publication dates or research fields. The resultant figures from the these notebooks are stored in the 'outputs' folder.
    • MethodsWorkflow: The MethodsWorkflow file provides a conceptual overview of the workflow used in this study.

    Accessing Data and Resources:

    • All the data generated during the initial study can be accessed at https://doi.org/10.5281/zenodo.6802158
    • For the latest results and re-run data, refer to this link.
    • The comprehensive SQLite database that encapsulates all the study's extracted data is stored in the db.sqlite file.
    • The metadata in xml format extracted from PubMed Central which contains the information about the articles and journal can be accessed in pmc.xml file.

    System Requirements:

    Running the pipeline:

    • Clone the computational-reproducibility-pmc repository using Git:
      git clone https://github.com/fusion-jena/computational-reproducibility-pmc.git
    • Navigate to the computational-reproducibility-pmc directory:
      cd computational-reproducibility-pmc/computational-reproducibility-pmc
    • Configure environment variables in the config.py file:
      GITHUB_USERNAME = os.environ.get("JUP_GITHUB_USERNAME", "add your github username here")
      GITHUB_TOKEN = os.environ.get("JUP_GITHUB_PASSWORD", "add your github token here")
    • Other environment variables can also be set in the config.py file.
      BASE_DIR = Path(os.environ.get("JUP_BASE_DIR", "./")).expanduser() # Add the path of directory where the GitHub repositories will be saved
      DB_CONNECTION = os.environ.get("JUP_DB_CONNECTION", "sqlite:///db.sqlite") # Add the path where the database is stored.
    • To set up conda environments for each python versions, upgrade pip, install pipenv, and install the archaeology package in each environment, execute:
      source conda-setup.sh
    • Change to the archaeology directory
      cd archaeology
    • Activate conda environment. We used py36 to run the pipeline.
      conda activate py36
    • Execute the main pipeline script (r0_main.py):
      python r0_main.py

    Running the analysis:

    • Navigate to the analysis directory.
      cd analyses
    • Activate conda environment. We use raw38 for the analysis of the metadata collected in the study.
      conda activate raw38
    • Install the required packages using the requirements.txt file.
      pip install -r requirements.txt
    • Launch Jupyterlab
      jupyter lab
    • Refer to the Index.ipynb notebook for the execution order and guidance.

    References:

  2. f

    Data Sheet 1_BioBricks.ai: a versioned data registry for life sciences data...

    • frontiersin.figshare.com
    pdf
    Updated Aug 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yifan Gao; Zakariyya Mughal; Jose A. Jaramillo-Villegas; Marie Corradi; Alexandre Borrel; Ben Lieberman; Suliman Sharif; John Shaffer; Karamarie Fecho; Ajay Chatrath; Alexandra Maertens; Marc A. T. Teunis; Nicole Kleinstreuer; Thomas Hartung; Thomas Luechtefeld (2025). Data Sheet 1_BioBricks.ai: a versioned data registry for life sciences data assets.pdf [Dataset]. http://doi.org/10.3389/frai.2025.1599412.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 13, 2025
    Dataset provided by
    Frontiers
    Authors
    Yifan Gao; Zakariyya Mughal; Jose A. Jaramillo-Villegas; Marie Corradi; Alexandre Borrel; Ben Lieberman; Suliman Sharif; John Shaffer; Karamarie Fecho; Ajay Chatrath; Alexandra Maertens; Marc A. T. Teunis; Nicole Kleinstreuer; Thomas Hartung; Thomas Luechtefeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionResearchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.MethodsWe created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).ResultsThe current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.DiscussionBioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.

  3. D

    Veterinary Master Data Management Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Veterinary Master Data Management Market Research Report 2033 [Dataset]. https://dataintelo.com/report/veterinary-master-data-management-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2025 - 2034
    Area covered
    Global
    Description

    Veterinary Master Data Management Market Outlook



    As per our latest research, the global veterinary master data management market size reached USD 1.24 billion in 2024, reflecting robust demand for digital solutions in animal healthcare. The market is registering a compound annual growth rate (CAGR) of 12.1% and is projected to attain USD 3.48 billion by 2033. This remarkable expansion is fueled by the accelerating adoption of digital records, regulatory mandates for traceability, and the rising complexity of veterinary practices worldwide. The surge in pet ownership, coupled with advancements in veterinary diagnostics and treatments, is driving the need for centralized and accurate data management systems, thus underpinning the market’s strong growth trajectory.




    One of the primary growth factors of the veterinary master data management market is the increasing digitization of veterinary healthcare processes. Veterinary practices are increasingly transitioning from manual record-keeping to sophisticated digital platforms that offer real-time access, error reduction, and improved data accuracy. The integration of electronic health records (EHRs) and practice management software has become a standard, enabling seamless sharing of patient information across clinics, laboratories, and pharmacies. With the growing emphasis on evidence-based veterinary medicine, data-driven decision-making is emerging as a crucial aspect, pushing clinics and hospitals to invest in master data management solutions that can harmonize disparate datasets, streamline workflows, and ensure compliance with industry standards.




    Another significant driver is the growing regulatory scrutiny and the need for compliance management in the animal health sector. Regulatory bodies across North America, Europe, and Asia Pacific are imposing stringent requirements for the traceability of pharmaceuticals, vaccines, and medical devices used in veterinary care. These regulations necessitate the maintenance of precise and up-to-date data records, compelling veterinary hospitals, research institutes, and pharmacies to adopt robust master data management systems. Furthermore, the increasing threat of zoonotic diseases and the global focus on One Health initiatives are prompting stakeholders to prioritize accurate data capture and reporting, which further accelerates the adoption of advanced data management technologies.




    The proliferation of advanced technologies such as artificial intelligence, machine learning, and cloud computing is also transforming the veterinary master data management landscape. Cloud-based solutions are gaining traction due to their scalability, cost-effectiveness, and ability to facilitate remote access to critical data. This is particularly important in the context of multi-site veterinary practices and research collaborations that span geographies. AI-powered analytics are enabling veterinary professionals to derive actionable insights from large datasets, enhancing diagnostic accuracy, treatment outcomes, and operational efficiency. These technological advancements are expanding the functionality and appeal of master data management platforms, making them indispensable tools for modern veterinary institutions.




    From a regional perspective, North America continues to dominate the veterinary master data management market, accounting for the largest revenue share in 2024. The region's leadership is underpinned by the presence of a well-developed veterinary infrastructure, high adoption rates of digital technologies, and favorable regulatory frameworks. Europe is also witnessing substantial growth, driven by the increasing focus on animal welfare and the harmonization of veterinary regulations across the European Union. Meanwhile, Asia Pacific is emerging as a high-growth market, fueled by rising pet ownership, expanding veterinary services, and significant investments in digital healthcare infrastructure. Latin America and the Middle East & Africa are gradually catching up, with growing awareness and adoption of data management solutions in animal healthcare settings.



    Component Analysis



    The veterinary master data management market, segmented by component, comprises software and services, each playing a pivotal role in shaping the industry’s evolution. The software segment dominates the market, driven by the increasing need for centralized data repositories and automated workflows within veterinary practices. Modern veterinary master data

  4. Project vernieuwing open access monitoring - peer-reviewed artikelen...

    • zenodo.org
    csv, txt
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bianca Kramer; Bianca Kramer (2025). Project vernieuwing open access monitoring - peer-reviewed artikelen [Dataset] [Dataset]. http://doi.org/10.5281/zenodo.15164365
    Explore at:
    txt, csvAvailable download formats
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bianca Kramer; Bianca Kramer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Deze dataset behoort bij het rapport "Project vernieuwing open access monitoring - Rapportage fase 1 - peer-reviewed artikelen" (https://doi.org/10.5281/zenodo.15061685). De uitgangspunten van het project waren het opzetten van een transparante en reproduceerbare workflow voor open access monitoring van peer-reviewed artikelen van de Nederlandse universiteiten, gebruikmakend van open data en met code en data die geheel open gedeeld kunnen worden.

    De dataset omvat record-level informatie over peer-reviewed artikelen van de Nederlandse universiteiten van publicatiejaar 2023, zoals aangeleverd door de instellingen vanuit hun CRIS-systemen. De data zijn aangevuld met bibliografische informatie uit Crossref, DOAJ, de ISSN registry, en Unpaywall.

    In totaal zijn 50.115 unieke DOIs meegenomen in de analyse (dit is inclusief publicaties van de Universiteit van Humanistiek). Hiervan kon van 49.815 publicaties de OA-status vastgesteld worden.

    Behalve informatie over Open Access types bevat de dataset ook informatie over:

    • de mate waarin artikelen een open licentie hebben en zo ja, welke licentie;
    • de timing van het moment van in green open access beschikbaar komen ten opzichte van de publicatiedatum (o.a. door embargo-termijnen);
    • de repositories waar green open access versies beschikbaar zijn.

    Noot: De resultaten van deze centrale monitoring laten verschillen met de bestaande decentrale monitoring zien. Met name het aandeel OA via repositories is lager. Deels kunnen de verschillen worden verklaard uit de set artikelen die is gebruikt, deels uit de manier waarop de OA status is vastgesteld. Een uitgebreide bespreking van de verschillen tussen de bestaande decentrale monitoring en deze centrale monitoring is terug te vinden in paragraaf 4.1. van de projectrapportage.

    -----------------------------------------

    This dataset is associated with the report "Project Vernieuwing Open Access Monitoring - Report Phase 1 - Peer-Reviewed Articles" [in Dutch] (https://doi.org/10.5281/zenodo.15061685). The project's objectives were to establish a transparent and reproducible workflow for centralized open access monitoring of peer-reviewed articles of the Dutch universities, utilizing open data, with code and data that can be fully shared openly.

    The dataset contains record-level information on peer-reviewed articles from Dutch universities for publication year 2023, as provided by the institutions from their CRIS systems. The data has been supplemented with bibliographic information from Crossref, DOAJ, the ISSN registry, and Unpaywall.

    In total, 50,115 unique DOIs were included in the analysis (including publications from the University of Humanistic Studies). The OA status of 49,815 publications was determined.

    In addition to information on Open Access types, the dataset also includes details on:

    • the extent to which articles have an open license, and if so, which license;
    • when articles became available in green open access relative to the publication date (e.g., due to embargo periods);
    • the repositories where green open access versions are available.

    Note: The results of this central monitoring show differences compared to the existing decentralized monitoring. In particular, the share of OA via repositories is lower. Some of the differences can be explained by the set of articles used and the way in which OA status was determined. A detailed discussion of the differences between the existing decentralized monitoring and this central monitoring can be found in section 4.1 of the project report.

  5. f

    Data composition and characteristics of included studies.

    • figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew G. Crowson; Dana Moukheiber; Aldo Robles Arévalo; Barbara D. Lam; Sreekar Mantena; Aakanksha Rana; Deborah Goss; David W. Bates; Leo Anthony Celi (2023). Data composition and characteristics of included studies. [Dataset]. http://doi.org/10.1371/journal.pdig.0000033.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS Digital Health
    Authors
    Matthew G. Crowson; Dana Moukheiber; Aldo Robles Arévalo; Barbara D. Lam; Sreekar Mantena; Aakanksha Rana; Deborah Goss; David W. Bates; Leo Anthony Celi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data composition and characteristics of included studies.

  6. n

    Data from: A crop type dataset for consistent land cover classification in...

    • access.earthdata.nasa.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). A crop type dataset for consistent land cover classification in Central Asia [Dataset]. http://doi.org/10.6084/m9.figshare.12047478.v2
    Explore at:
    Dataset updated
    Oct 10, 2023
    Time period covered
    Jan 1, 2020 - Jan 1, 2023
    Area covered
    Description

    Land cover is a key variable in the context of climate change. In particular, crop type information is essential to understand the spatial distribution of water usage and anticipate the risk of water scarcity and the consequent danger of food insecurity. This applies to arid regions such as the Aral Sea Basin (ASB), Central Asia, where agriculture relies heavily on irrigation. Here, remote sensing is valuable to map crop types, but its quality depends on consistent ground-truth data. Yet, in the ASB, such data is missing. Addressing this issue, we collected thousands of polygons on crop types, 97.7% of which in Uzbekistan and the remaining in Tajikistan. We collected 8,196 samples between 2015 and 2018, 213 in 2011 and 26 in 2008. Our data compiles samples for 40 crop types and is dominated by “cotton” (40%) and “wheat”, (25%). These data were meticulously validated using expert knowledge and remote sensing data and relied on transferable, open-source workflows that will assure the consistency of future sampling campaigns.

  7. r

    Ground Deformation Dataset for Geohazard Monitoring in the Central Ionian...

    • resodate.org
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Foumelis; Elena Papageorgiou; Pavlos Bonatis (2025). Ground Deformation Dataset for Geohazard Monitoring in the Central Ionian Islands (Greece) within the HOMEROS Project [Dataset]. https://resodate.org/resources/aHR0cHM6Ly96ZW5vZG8ub3JnL3JlY29yZHMvMTcyNTM2NDM=
    Explore at:
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    Zenodo
    Authors
    Michael Foumelis; Elena Papageorgiou; Pavlos Bonatis
    Area covered
    Greece, Ionian Islands
    Description
    1. Introduction This dataset presents ground deformation measurements from the Central Ionian Islands (Greece) derived through Persistent Scatterer Interferometry (PSI) analysis within the framework of the HOMEROS project. The dataset is designed to support the detection, monitoring, and analysis of ground displacement over time, contributing to the assessment and mitigation of natural hazards such as earthquakes, landslides, floods, and subsidence. The observations were obtained from Sentinel-1 Synthetic Aperture Radar (SAR) acquisitions provided by the European Space Agencys Copernicus Programme. Data processing was carried out using the SNAPPING PSI service (Foumelis et al., 2022) ensuring reproducible and traceable workflows compliant with Open Science principles. The dataset includes both ascending and descending line-of-sight (LOS) displacement products, as well as 3D decomposed components (East, Up) obtained through two independent decomposition methods. The dataset will be updated every 6 months to ensure continued accessibility, interoperability, and reusability of the most recent ground-deformation observations from Sentinel-1 acquisitions. Each release will be published as a new Zenodo version under the same DOI, ensuring transparent versioning, long-term traceability, and consistent citation across updates.
    2. Files File naming conventions: All files in this dataset follow a structured naming convention designed to convey essential metadata directly in the filename. For example:

    homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.csv

    refers to: [project name][study area][processing algorithm][methodology][resolution][sensor or satellite mission][track/orbit descriptor][first qcquisition][last acquisition]_ref[reference acquisition date].[file extension]

    Notes:- All dates use the format YYYYMM for start and end, and YYYYMMDD for the reference date. - Underscores are used consistently to separate fields. - The naming convention is consistent across all data formats.- Decomposed 3D components include the descriptor of the resampling method used (grid: grid-based, nnv: nearest neighbor vector approach (Foumelis, 2016) and the number of the grid resolution in meters.

    • Decomposed 3D components include the component name (ew, up) before the file extension.

    The full dataset contains the following files:

    metadata_dictionary.csv: Descriptions of metadata fields. Ascending and Descending Line-of-Sight (LOS) products: Each geometry folder contains CSV files with displacement time series, GeoTIFF rasters representing mean velocity and coherence, and shapefile layers for geospatial visualization and corresponding metadata JSON files. 3D Decomposed Components (E-W and Up): Outputs from the grid-based and nearest-neighbor vector decomposition methods, provided at 50m and 100m spatial resolutions. Each subfolder contains displacement rasters and CSV tables for the respective 3D components, together with detailed metadata.

    Total size: ~240 MB (compressed) / ~933 MB (uncompressed) 3. Dataset structure Central_Ionian_dataset/│├── README.md├── LICENSE.txt│├── ascending/│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.csv│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.tiff│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108_coh.tiff│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.shp│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.dbf│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.prj│ ├── homeros_ionian_snapping_psi_med_s1_a175_202201_202412_ref20220108.shx│ └── metadata_ascending.json│├── descending/│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.csv│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.tiff│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102_coh.tiff│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.shp│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.dbf│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.prj│ ├── homeros_ionian_snapping_psi_med_s1_d080_202201_202412_ref20220102.shx│ └── metadata_descending.json│└── 3d_components/ ├── GRID/ │ ├── 50/ │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_ew.csv │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_up.csv │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_ew.tiff │ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid50_up.tiff │ │ └── metadata_grid_50.json │ │ │ └── 100/ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_ew.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_up.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_ew.tiff │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_grid100_up.tiff │ └── metadata_grid_100.json │ └── NNV/ ├── 50/ │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_ew.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_up.csv │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_ew.tiff │ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv50_up.tiff │ └── metadata_nnv_50.json │ └── 100/ ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_ew.csv ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_up.csv ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_ew.tiff ├── homeros_ionian_snapping_psi_med_202201_202412_3d_decomp_nnv100_up.tiff └── metadata_nnv_100.json 4. How to cite

    Foumelis, M., Papageorgiou, E., Bonatis, P. (2025). Ground Deformation Dataset for Geohazard Monitoring in the Central Ionian Islands (Greece) within the HOMEROS Project [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17253643

    1. References

    Foumelis M. (2016). Vector-based approach for combining ascending and descending persistent scatterers interferometric point measurements. Geocarto International,33(1), 3852. https://doi.org/10.1080/10106049.2016.1222636

    Foumelis M, Delgado Blasco JM, Brito F, Pacini F, Papageorgiou E, Pishehvar P, Bally P. (2022). SNAPPING Services on the Geohazards Exploitation Platform for Copernicus Sentinel-1 Surface Motion Mapping.Remote Sensing, 14(23), 6075. https://doi.org/10.3390/rs14236075

  8. H

    HDX HAPI Data for Central African Republic

    • data.humdata.org
    csv
    Updated Feb 2, 2026
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HDX Humanitarian API Data (2026). HDX HAPI Data for Central African Republic [Dataset]. https://data.humdata.org/dataset/hdx-hapi-caf
    Explore at:
    csv(1230086), csv(31901), csv(1959), csv(306), csv(52322), csv(2074269), csv(1902160), csv(180528), csv(28696), csv(6118), csv(2884218), csv(8301669), csv(1429621), csv(121475)Available download formats
    Dataset updated
    Feb 2, 2026
    Dataset provided by
    HDX Humanitarian API Data
    Area covered
    Central African Republic
    Description

    This dataset contains data obtained from the HDX Humanitarian API (HDX HAPI), which provides standardized humanitarian indicators designed for seamless interoperability from multiple sources. The data facilitates automated workflows and visualizations to support humanitarian decision making. For more information, please see the HDX HAPI landing page and documentation.

  9. f

    Leveraging Zebrafish Embryo Phenotypic Observations to Advance Data-Driven...

    • acs.figshare.com
    txt
    Updated Feb 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Michaelis; Nils Klüver; Silke Aulhorn; Hannes Bohring; Jan Bumberger; Kristina Haase; Tobias Kuhnert; Eberhard Küster; Janet Krüger; Till Luckenbach; Riccardo Massei; Lukas Nerlich; Sven Petruschke; Thomas Schnicke; Anton Schnurpel; Stefan Scholz; Nicole Schweiger; Daniel Sielaff; Wibke Busch (2025). Leveraging Zebrafish Embryo Phenotypic Observations to Advance Data-Driven Analyses in Toxicology [Dataset]. http://doi.org/10.1021/acs.est.4c11757.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 27, 2025
    Dataset provided by
    ACS Publications
    Authors
    Paul Michaelis; Nils Klüver; Silke Aulhorn; Hannes Bohring; Jan Bumberger; Kristina Haase; Tobias Kuhnert; Eberhard Küster; Janet Krüger; Till Luckenbach; Riccardo Massei; Lukas Nerlich; Sven Petruschke; Thomas Schnicke; Anton Schnurpel; Stefan Scholz; Nicole Schweiger; Daniel Sielaff; Wibke Busch
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Zebrafish have emerged as a central model organism in toxicological research. Zebrafish embryos are exempt from certain animal testing regulations, which facilitates their use in toxicological testing. Next to the zebrafish embryo acute toxicity test (ZFET) according to the OECD TG 236, fish embryos are used in mechanistic investigations, chemical screenings, ecotoxicology, and drug development. However, inconsistencies in the applied test protocols and the monitored endpoints in addition to a lack of standardized data formats impede comprehensive meta-analyses and cross-study comparisons. To address these challenges, we developed the Integrated Effect Database for Toxicological Observations (INTOB), a comprehensive data management tool that standardizes the collection of metadata and phenotypic observations using a controlled vocabulary. By incorporating data from more than 600 experiments into the database and subsequent comprehensive data analyses, we demonstrate its utility in improving the comparability and interoperability of toxicity data. Our results show that the ZFET can detect toxicity spanning 7 orders of magnitude at the scale of effect concentrations. We also highlight the potential of read-across analyses based on morphological fingerprints and their connection to chemical modes of action, provide information on control variability of the ZFET, and highlight the importance of time for mechanistic understanding in chemical exposure-effect assessments. We provide the full Findable, Accessible, Interoperable, and Reusable (FAIR) data set as well as the analysis workflow and demonstrate how professional data management, as enabled with INTOB, marks a significant advancement by offering a comprehensive framework for the systematic use of zebrafish embryo toxicity data, thus paving the way for more reliable, data-driven chemical risk assessment.

  10. FAIR Data Research with NOMAD and NOMAD Oasis

    • meta4cat.fokus.fraunhofer.de
    pdf, unknown
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). FAIR Data Research with NOMAD and NOMAD Oasis [Dataset]. https://meta4cat.fokus.fraunhofer.de/datasets/oai-zenodo-org-10868118?locale=en
    Explore at:
    pdf(1000492), unknownAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The NOMAD poster provides a comprehensive overview of two powerful tools, NOMAD and NOMAD Oasis, designed to revolutionize the landscape of scientific data research. Developed by the FAIRmat consortium, NOMAD and NOMAD Oasis promote the principles of Findable, Accessible, Interoperable, and Reusable(FAIR) data, enhancing transparency and collaboration in materials science. NOMAD is a centralized material science data repository addressing the needs of researchers from various domains and fields. It seamlessly manages diverse data formats, including raw files, facilitating efficient data analysis and exploration. The posterhighlights the key features and use cases of NOMAD and NOMAD Oasis while thoroughly exploring their capabilities.The project outlines the data workflow within the NOMAD ecosystem, illustrating how uploaded data is transformed into processed and modeled data. Subsequently, it offers three options for utilizing the parsed data: publish, analyze, and explore. Researchers can leverage NOMAD's data publishing capabilities toshare their findings with the scientific community, obtain a DOI, and support open collaboration. By promoting FAIR data research, the NOMAD poster underscores the importance of data integrity, accessibility, and interoperability. It is a valuable resource for scientists, data centers, and scientific organizations, showcasingthe potential of NOMAD and NOMAD Oasis in everyday workflow.

  11. Used github commits when running Nextflow workflows.

    • plos.figshare.com
    xlsx
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ingo A. Müller; Filip Thörn; Samyuktha Rajan; Per G. P. Ericson; John P. Dumbacher; Gibson Maiah; Mozes P. K. Blom; Knud A. Jønsson; Martin Irestedt (2024). Used github commits when running Nextflow workflows. [Dataset]. http://doi.org/10.1371/journal.pone.0293715.s016
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ingo A. Müller; Filip Thörn; Samyuktha Rajan; Per G. P. Ericson; John P. Dumbacher; Gibson Maiah; Mozes P. K. Blom; Knud A. Jønsson; Martin Irestedt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Used github commits when running Nextflow workflows.

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen (2024). Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications [Dataset]. http://doi.org/10.5281/zenodo.8226725
Organization logo

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
zip, pdfAvailable download formats
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sheeba Samuel; Sheeba Samuel; Daniel Mietchen; Daniel Mietchen
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

Data Collection and Analysis

We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.

Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.

All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.

Our reproducibility pipeline was started on 27 March 2023.

Repository Structure

Our repository is organized into two main folders:

  • archaeology: This directory hosts scripts designed to download, parse, and extract metadata from PubMed Central publications and associated repositories. There are 24 database tables created which store the information on articles, journals, authors, repositories, notebooks, cells, modules, executions, etc. in the db.sqlite database file.
  • analyses: Here, you will find notebooks instrumental in the in-depth analysis of data related to our study. The db.sqlite file generated by running the archaelogy folder is stored in the analyses folder for further analysis. The path can however be configured in the config.py file. There are two sets of notebooks: one set (naming pattern N[0-9]*.ipynb) is focused on examining data pertaining to repositories and notebooks, while the other set (PMC[0-9]*.ipynb) is for analyzing data associated with publications in PubMed Central, i.e.\ for plots involving data about articles, journals, publication dates or research fields. The resultant figures from the these notebooks are stored in the 'outputs' folder.
  • MethodsWorkflow: The MethodsWorkflow file provides a conceptual overview of the workflow used in this study.

Accessing Data and Resources:

  • All the data generated during the initial study can be accessed at https://doi.org/10.5281/zenodo.6802158
  • For the latest results and re-run data, refer to this link.
  • The comprehensive SQLite database that encapsulates all the study's extracted data is stored in the db.sqlite file.
  • The metadata in xml format extracted from PubMed Central which contains the information about the articles and journal can be accessed in pmc.xml file.

System Requirements:

Running the pipeline:

  • Clone the computational-reproducibility-pmc repository using Git:
    git clone https://github.com/fusion-jena/computational-reproducibility-pmc.git
  • Navigate to the computational-reproducibility-pmc directory:
    cd computational-reproducibility-pmc/computational-reproducibility-pmc
  • Configure environment variables in the config.py file:
    GITHUB_USERNAME = os.environ.get("JUP_GITHUB_USERNAME", "add your github username here")
    GITHUB_TOKEN = os.environ.get("JUP_GITHUB_PASSWORD", "add your github token here")
  • Other environment variables can also be set in the config.py file.
    BASE_DIR = Path(os.environ.get("JUP_BASE_DIR", "./")).expanduser() # Add the path of directory where the GitHub repositories will be saved
    DB_CONNECTION = os.environ.get("JUP_DB_CONNECTION", "sqlite:///db.sqlite") # Add the path where the database is stored.
  • To set up conda environments for each python versions, upgrade pip, install pipenv, and install the archaeology package in each environment, execute:
    source conda-setup.sh
  • Change to the archaeology directory
    cd archaeology
  • Activate conda environment. We used py36 to run the pipeline.
    conda activate py36
  • Execute the main pipeline script (r0_main.py):
    python r0_main.py

Running the analysis:

  • Navigate to the analysis directory.
    cd analyses
  • Activate conda environment. We use raw38 for the analysis of the metadata collected in the study.
    conda activate raw38
  • Install the required packages using the requirements.txt file.
    pip install -r requirements.txt
  • Launch Jupyterlab
    jupyter lab
  • Refer to the Index.ipynb notebook for the execution order and guidance.

References:

Search
Clear search
Close search
Google apps
Main menu