Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Data Collection and Analysis
We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.
Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.
All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.
Our reproducibility pipeline was started on 27 March 2023.
Repository Structure
Our repository is organized into two main folders:
Accessing Data and Resources:
System Requirements:
Running the pipeline:
Running the analysis:
References:
Facebook
Twitter
According to our latest research, the global dataset versioning platform market size reached USD 1.32 billion in 2024, reflecting robust adoption across industries as organizations seek to manage and track complex data workflows. The market is expected to exhibit a strong compound annual growth rate (CAGR) of 18.6% over the forecast period, reaching a projected value of USD 6.13 billion by 2033. This dynamic growth is primarily fueled by the increasing reliance on data-driven decision-making, the proliferation of machine learning and artificial intelligence initiatives, and the need for enhanced data governance and compliance in a rapidly evolving digital landscape.
One of the primary growth factors driving the dataset versioning platform market is the exponential rise in data volumes generated by enterprises globally. As organizations harness big data for advanced analytics, machine learning, and AI applications, the complexity of data management has surged. Dataset versioning platforms provide the necessary infrastructure to track, audit, and reproduce data changes across the lifecycle of analytics and model development. This capability is critical for ensuring data integrity, facilitating collaboration among data science teams, and maintaining compliance with regulatory standards. Moreover, the increasing adoption of open-source data science tools and the integration of versioning solutions with popular machine learning frameworks are further accelerating market expansion.
Another significant driver is the growing need for collaboration and reproducibility in the research and development sector. As multidisciplinary teams work on large-scale projects, the ability to seamlessly share, update, and revert datasets becomes essential. Dataset versioning platforms offer granular control over data changes, enabling researchers and analysts to experiment with different data iterations without risking data loss or inconsistencies. This not only streamlines the workflow but also supports the transparency and accountability required in scientific research, especially in fields like healthcare, pharmaceuticals, and academia where data provenance is paramount. The rise of remote and distributed workforces has also amplified demand for cloud-based versioning platforms that support real-time collaboration and centralized data management.
The increasing emphasis on data governance, security, and compliance is another critical factor propelling the market. With stringent regulations such as GDPR, HIPAA, and CCPA, organizations must maintain meticulous records of data usage, access, and modifications. Dataset versioning platforms provide comprehensive audit trails, access controls, and rollback capabilities, empowering enterprises to meet regulatory requirements efficiently. Additionally, the integration of automated data lineage tracking and policy enforcement features has made these platforms indispensable for industries like banking, financial services, and insurance (BFSI), where data accuracy and security are non-negotiable. This regulatory landscape is expected to continue shaping the adoption patterns and innovation trajectories within the dataset versioning platform market.
From a regional perspective, North America currently leads the global dataset versioning platform market, accounting for the largest share in 2024 due to its advanced technological infrastructure, strong presence of leading cloud service providers, and early adoption of AI and machine learning. Europe follows closely, driven by the region’s robust regulatory environment and growing investments in digital transformation. The Asia Pacific region is poised for the fastest growth, with a projected CAGR exceeding 21% over the forecast period, as enterprises in countries like China, India, and Japan accelerate their adoption of data-centric technologies. Latin America and the Middle East & Africa are also witnessing steady growth, supported by increasing digitalization and the expansion of cloud services in emerging markets.
Facebook
TwitterThe REMS (Resource Entitlement Management System) extension for CKAN brings access rights management capabilities to datasets. By integrating with REMS, this extension allows organizations to manage and control access to sensitive or restricted data through application workflows and approval processes. This enables a more secure and governed environment for data sharing within CKAN. Key Features: REMS Integration: Integrates CKAN with the REMS system for managing access rights to datasets, providing a centralized control point for permissions. Application Form and Workflow Design: Utilizes REMS' tools for designing application forms and defining workflows for requesting access to datasets. Access Request Management: Enables end-users to apply for access to datasets through the defined REMS application workflows. Workflow Processing: Provides administrators and authorized users with the tools to process access requests, manage approvals, and administer granted access rights within the REMS interface. Shibboleth Configuration Support: Supports Shibboleth configuration for authentication, potentially enabling single sign-on (SSO) capabilities for accessing REMS-protected datasets. Technical Integration: The REMS extension integrates with CKAN through configuration settings defined in the .ini file. It also utilizes the Kata extension as a dependency. Shibboleth configuration details are outlined in the config/shibboleth/README.txt file, giving direction on how to set up single sign on. The extension essentially connects CKAN datasets to the permissioning framework within a separate REMS instance. Benefits & Impact: The REMS extension provides enhanced security and control over dataset access within CKAN. This helps organizations to comply with data governance policies and regulations by enabling a structured and auditable process for granting permissions. Using a separate REMS system offloads access right activities.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
As per our latest research, the global veterinary master data management market size reached USD 1.24 billion in 2024, reflecting robust demand for digital solutions in animal healthcare. The market is registering a compound annual growth rate (CAGR) of 12.1% and is projected to attain USD 3.48 billion by 2033. This remarkable expansion is fueled by the accelerating adoption of digital records, regulatory mandates for traceability, and the rising complexity of veterinary practices worldwide. The surge in pet ownership, coupled with advancements in veterinary diagnostics and treatments, is driving the need for centralized and accurate data management systems, thus underpinning the market’s strong growth trajectory.
One of the primary growth factors of the veterinary master data management market is the increasing digitization of veterinary healthcare processes. Veterinary practices are increasingly transitioning from manual record-keeping to sophisticated digital platforms that offer real-time access, error reduction, and improved data accuracy. The integration of electronic health records (EHRs) and practice management software has become a standard, enabling seamless sharing of patient information across clinics, laboratories, and pharmacies. With the growing emphasis on evidence-based veterinary medicine, data-driven decision-making is emerging as a crucial aspect, pushing clinics and hospitals to invest in master data management solutions that can harmonize disparate datasets, streamline workflows, and ensure compliance with industry standards.
Another significant driver is the growing regulatory scrutiny and the need for compliance management in the animal health sector. Regulatory bodies across North America, Europe, and Asia Pacific are imposing stringent requirements for the traceability of pharmaceuticals, vaccines, and medical devices used in veterinary care. These regulations necessitate the maintenance of precise and up-to-date data records, compelling veterinary hospitals, research institutes, and pharmacies to adopt robust master data management systems. Furthermore, the increasing threat of zoonotic diseases and the global focus on One Health initiatives are prompting stakeholders to prioritize accurate data capture and reporting, which further accelerates the adoption of advanced data management technologies.
The proliferation of advanced technologies such as artificial intelligence, machine learning, and cloud computing is also transforming the veterinary master data management landscape. Cloud-based solutions are gaining traction due to their scalability, cost-effectiveness, and ability to facilitate remote access to critical data. This is particularly important in the context of multi-site veterinary practices and research collaborations that span geographies. AI-powered analytics are enabling veterinary professionals to derive actionable insights from large datasets, enhancing diagnostic accuracy, treatment outcomes, and operational efficiency. These technological advancements are expanding the functionality and appeal of master data management platforms, making them indispensable tools for modern veterinary institutions.
From a regional perspective, North America continues to dominate the veterinary master data management market, accounting for the largest revenue share in 2024. The region's leadership is underpinned by the presence of a well-developed veterinary infrastructure, high adoption rates of digital technologies, and favorable regulatory frameworks. Europe is also witnessing substantial growth, driven by the increasing focus on animal welfare and the harmonization of veterinary regulations across the European Union. Meanwhile, Asia Pacific is emerging as a high-growth market, fueled by rising pet ownership, expanding veterinary services, and significant investments in digital healthcare infrastructure. Latin America and the Middle East & Africa are gradually catching up, with growing awareness and adoption of data management solutions in animal healthcare settings.
The veterinary master data management market, segmented by component, comprises software and services, each playing a pivotal role in shaping the industry’s evolution. The software segment dominates the market, driven by the increasing need for centralized data repositories and automated workflows within veterinary practices. Modern veterinary master data
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global AI Dataset Search Platform market size reached USD 1.87 billion in 2024, with a robust year-on-year growth trajectory. The market is projected to expand at a CAGR of 27.6% during the forecast period, reaching an estimated USD 16.17 billion by 2033. This remarkable growth is primarily attributed to the escalating demand for high-quality, diverse, and scalable datasets required to train advanced artificial intelligence and machine learning models across various industries. The proliferation of AI-driven applications and the increasing emphasis on data-centric AI development are key growth factors propelling the adoption of AI dataset search platforms globally.
The surge in AI adoption across sectors such as healthcare, BFSI, retail, automotive, and education is fueling the need for efficient and reliable dataset discovery solutions. Organizations are increasingly recognizing that the success of AI models hinges on the quality and relevance of the training data, leading to a surge in investments in dataset search platforms that offer advanced filtering, metadata tagging, and data governance capabilities. The integration of AI dataset search platforms with cloud infrastructures further streamlines data access, collaboration, and compliance, making them indispensable tools for enterprises aiming to accelerate AI innovation. The growing complexity of AI projects, coupled with the exponential growth in data volumes, is compelling organizations to seek platforms that can automate and optimize the process of dataset discovery and curation.
Another significant growth factor is the rapid evolution of AI regulations and data privacy frameworks worldwide. As data governance becomes a top priority, AI dataset search platforms are evolving to include robust features for data lineage tracking, access control, and compliance with regulations such as GDPR, HIPAA, and CCPA. The ability to ensure ethical sourcing and transparent usage of datasets is increasingly valued by enterprises and academic institutions alike. This regulatory landscape is driving the adoption of platforms that not only facilitate efficient dataset search but also enable organizations to demonstrate accountability and compliance in their AI initiatives.
The expanding ecosystem of AI developers, data scientists, and machine learning engineers is also contributing to the market's growth. The democratization of AI development, supported by open-source frameworks and cloud-based collaboration tools, has increased the demand for platforms that can aggregate, index, and provide easy access to diverse datasets. AI dataset search platforms are becoming central to fostering innovation, reducing development cycles, and enabling cross-domain research. As organizations strive to stay ahead in the competitive AI landscape, the ability to quickly identify and utilize optimal datasets is emerging as a critical differentiator.
From a regional perspective, North America currently dominates the AI dataset search platform market, accounting for over 38% of global revenue in 2024, driven by the strong presence of leading AI technology companies, active research communities, and significant investments in digital transformation. Europe and Asia Pacific are also witnessing rapid adoption, with Asia Pacific expected to exhibit the highest CAGR of 29.3% during the forecast period, fueled by government initiatives, burgeoning AI startups, and increasing digitalization across industries. Latin America and the Middle East & Africa are gradually embracing AI dataset search platforms, supported by growing awareness and investments in AI research and infrastructure.
The AI Dataset Search Platform market is segmented by component into Software and Services. Software solutions constitute the backbone of this market, providing the core functionalities required for dataset discovery, indexing, metadata management, and integration with existing AI workflows. The software segment is witnessing robust growth as organizations seek advanced platforms capable of handling large-scale, multi-source datasets with sophisticated search capabilities powered by natural language processing and machine learning algorithms. These platforms are increasingly incorporating features such as semantic search, automated data labeling, and customizable data pipelines, enabling users to eff
Facebook
TwitterLand cover is a key variable in the context of climate change. In particular, crop type information is essential to understand the spatial distribution of water usage and anticipate the risk of water scarcity and the consequent danger of food insecurity. This applies to arid regions such as the Aral Sea Basin (ASB), Central Asia, where agriculture relies heavily on irrigation. Here, remote sensing is valuable to map crop types, but its quality depends on consistent ground-truth data. Yet, in the ASB, such data is missing. Addressing this issue, we collected thousands of polygons on crop types, 97.7% of which in Uzbekistan and the remaining in Tajikistan. We collected 8,196 samples between 2015 and 2018, 213 in 2011 and 26 in 2008. Our data compiles samples for 40 crop types and is dominated by “cotton” (40%) and “wheat”, (25%). These data were meticulously validated using expert knowledge and remote sensing data and relied on transferable, open-source workflows that will assure the consistency of future sampling campaigns.
Facebook
Twitter
According to the latest research, the global Workflow Orchestration for Bioinformatics market size in 2024 stands at USD 2.31 billion, reflecting robust industry adoption and technological advancements. The market is forecasted to reach USD 7.91 billion by 2033, expanding at a compelling CAGR of 14.5% from 2025 to 2033. This growth is primarily propelled by increasing demand for automation in bioinformatics research and the rising volume of biological data generated by next-generation sequencing technologies.
The surge in market growth can be attributed to the escalating complexity and volume of bioinformatics data, which necessitates advanced workflow orchestration tools for efficient management and analysis. As genomics, proteomics, and other omics sciences generate vast datasets, the need for seamless integration, automation, and reproducibility in data processing pipelines becomes paramount. Workflow orchestration platforms provide a centralized environment to automate multi-step computational tasks, minimize human error, and accelerate research timelines. Additionally, the growing adoption of cloud-based solutions and hybrid deployment models further enhances the scalability and accessibility of these orchestration tools, allowing research organizations and enterprises to address computational bottlenecks and focus on core scientific inquiries.
Another significant driver for the Workflow Orchestration for Bioinformatics market is the increasing collaboration between pharmaceutical and biotechnology companies with academic and research institutes. These collaborations are fostering innovation and facilitating large-scale, multi-institutional research projects, which require sophisticated workflow orchestration solutions to manage distributed data and computational resources. The rise of precision medicine, drug discovery, and personalized therapies is also fueling demand for advanced bioinformatics workflows that can handle complex analyses, such as variant calling, transcriptome assembly, and protein structure prediction. As a result, vendors are continuously enhancing their software and service offerings to support interoperability, scalability, and regulatory compliance, further driving market expansion.
Moreover, the integration of artificial intelligence (AI) and machine learning (ML) algorithms within workflow orchestration platforms is transforming bioinformatics research. AI-powered orchestration tools enable automated decision-making, intelligent error handling, and adaptive optimization of computational pipelines. This not only improves the efficiency and accuracy of bioinformatics analyses but also supports the discovery of novel biological insights. The increasing availability of open-source workflow management systems and the proliferation of standardized data formats are lowering barriers to entry, enabling small and medium-sized enterprises (SMEs) and research groups to adopt these technologies and contribute to market growth.
Lab Automation in Genomics is revolutionizing the way researchers approach complex genomic analyses, offering unprecedented efficiency and accuracy. By automating routine laboratory tasks, such as sample preparation and data collection, lab automation technologies are reducing human error and increasing throughput in genomic studies. This advancement is particularly beneficial in handling the massive volumes of data generated by next-generation sequencing (NGS) technologies, allowing scientists to focus on data interpretation and discovery. Furthermore, lab automation is facilitating the integration of genomics with other omics disciplines, enhancing the ability to conduct comprehensive multi-omics studies. As the demand for high-throughput genomic analyses grows, the role of lab automation in genomics will continue to expand, driving innovation and improving research outcomes.
Regionally, North America continues to dominate the Workflow Orchestration for Bioinformatics market, accounting for the largest share due to its advanced healthcare infrastructure, significant investments in genomics research, and the presence of leading bioinformatics companies. Europe follows closely, driven by supportive government initiatives and a strong focus on life sciences innovation. The Asia Pacific region is
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Deze dataset behoort bij het rapport "Project vernieuwing open access monitoring - Rapportage fase 1 - peer-reviewed artikelen" (https://doi.org/10.5281/zenodo.15061685). De uitgangspunten van het project waren het opzetten van een transparante en reproduceerbare workflow voor open access monitoring van peer-reviewed artikelen van de Nederlandse universiteiten, gebruikmakend van open data en met code en data die geheel open gedeeld kunnen worden.
De dataset omvat record-level informatie over peer-reviewed artikelen van de Nederlandse universiteiten van publicatiejaar 2023, zoals aangeleverd door de instellingen vanuit hun CRIS-systemen. De data zijn aangevuld met bibliografische informatie uit Crossref, DOAJ, de ISSN registry, en Unpaywall.
In totaal zijn 50.115 unieke DOIs meegenomen in de analyse (dit is inclusief publicaties van de Universiteit van Humanistiek). Hiervan kon van 49.815 publicaties de OA-status vastgesteld worden.
Behalve informatie over Open Access types bevat de dataset ook informatie over:
Noot: De resultaten van deze centrale monitoring laten verschillen met de bestaande decentrale monitoring zien. Met name het aandeel OA via repositories is lager. Deels kunnen de verschillen worden verklaard uit de set artikelen die is gebruikt, deels uit de manier waarop de OA status is vastgesteld. Een uitgebreide bespreking van de verschillen tussen de bestaande decentrale monitoring en deze centrale monitoring is terug te vinden in paragraaf 4.1. van de projectrapportage.
-----------------------------------------
This dataset is associated with the report "Project Vernieuwing Open Access Monitoring - Report Phase 1 - Peer-Reviewed Articles" [in Dutch] (https://doi.org/10.5281/zenodo.15061685). The project's objectives were to establish a transparent and reproducible workflow for centralized open access monitoring of peer-reviewed articles of the Dutch universities, utilizing open data, with code and data that can be fully shared openly.
The dataset contains record-level information on peer-reviewed articles from Dutch universities for publication year 2023, as provided by the institutions from their CRIS systems. The data has been supplemented with bibliographic information from Crossref, DOAJ, the ISSN registry, and Unpaywall.
In total, 50,115 unique DOIs were included in the analysis (including publications from the University of Humanistic Studies). The OA status of 49,815 publications was determined.
In addition to information on Open Access types, the dataset also includes details on:
Note: The results of this central monitoring show differences compared to the existing decentralized monitoring. In particular, the share of OA via repositories is lower. Some of the differences can be explained by the set of articles used and the way in which OA status was determined. A detailed discussion of the differences between the existing decentralized monitoring and this central monitoring can be found in section 4.1 of the project report.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data composition and characteristics of included studies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Workflow, topologies and computing plan classification adapted from Rieke et al. [1].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The NOMAD poster provides a comprehensive overview of two powerful tools, NOMAD and NOMAD Oasis, designed to revolutionize the landscape of scientific data research. Developed by the FAIRmat consortium, NOMAD and NOMAD Oasis promote the principles of Findable, Accessible, Interoperable, and Reusable(FAIR) data, enhancing transparency and collaboration in materials science. NOMAD is a centralized material science data repository addressing the needs of researchers from various domains and fields. It seamlessly manages diverse data formats, including raw files, facilitating efficient data analysis and exploration. The posterhighlights the key features and use cases of NOMAD and NOMAD Oasis while thoroughly exploring their capabilities.The project outlines the data workflow within the NOMAD ecosystem, illustrating how uploaded data is transformed into processed and modeled data. Subsequently, it offers three options for utilizing the parsed data: publish, analyze, and explore. Researchers can leverage NOMAD's data publishing capabilities toshare their findings with the scientific community, obtain a DOI, and support open collaboration. By promoting FAIR data research, the NOMAD poster underscores the importance of data integrity, accessibility, and interoperability. It is a valuable resource for scientists, data centers, and scientific organizations, showcasingthe potential of NOMAD and NOMAD Oasis in everyday workflow.
Facebook
Twitter
As per our latest research, the global Scientific Data Management System (SDMS) market size has reached USD 4.2 billion in 2024, demonstrating robust growth with a Compound Annual Growth Rate (CAGR) of 11.8% anticipated throughout the forecast period. The market is projected to attain a value of USD 11.6 billion by 2033, driven by the increasing complexity of scientific research data and the growing demand for efficient data management solutions. This expansion is underpinned by the rapid digital transformation in the life sciences sector, the proliferation of data-intensive research, and the critical need for data integrity and compliance in regulated environments.
One of the primary growth factors for the Scientific Data Management System market is the exponential surge in data generation across scientific research domains such as genomics, proteomics, drug discovery, and clinical trials. Research organizations and pharmaceutical companies are generating petabytes of data daily, necessitating advanced data management platforms to ensure seamless data capture, storage, retrieval, and analysis. The integration of SDMS with laboratory information management systems (LIMS) and electronic lab notebooks (ELN) further enhances workflow efficiency and data traceability, bolstering adoption rates among both large enterprises and smaller research institutions. Moreover, the growing emphasis on data standardization and interoperability is compelling organizations to invest in robust SDMS platforms that can handle diverse data formats and facilitate collaborative research across geographies.
Another significant driver propelling the SDMS market is the stringent regulatory landscape governing scientific research, particularly in the pharmaceutical, biotechnology, and healthcare sectors. Regulatory bodies such as the FDA, EMA, and other international agencies mandate rigorous data documentation, audit trails, and data integrity protocols to ensure the reliability of research outcomes and patient safety. SDMS platforms are designed to address these compliance requirements, offering features such as automated audit trails, secure data access controls, and comprehensive reporting capabilities. The increasing prevalence of multi-site clinical trials and the globalization of research initiatives are further amplifying the need for centralized, standardized data management systems that can support regulatory submissions and streamline data governance.
The ongoing advancements in artificial intelligence, machine learning, and cloud computing are also playing a pivotal role in shaping the future of the Scientific Data Management System market. Modern SDMS solutions are leveraging AI-driven analytics to extract actionable insights from complex datasets, enabling researchers to accelerate hypothesis generation, identify novel biomarkers, and optimize experimental workflows. The adoption of cloud-based SDMS platforms is facilitating remote collaboration, real-time data sharing, and scalable storage solutions, making it easier for organizations to manage growing data volumes without significant infrastructure investments. These technological innovations are expected to drive further market expansion and foster the development of next-generation SDMS platforms tailored to the evolving needs of the scientific research community.
From a regional perspective, North America currently dominates the SDMS market, accounting for the largest share due to its well-established life sciences industry, significant R&D investments, and early adoption of advanced data management technologies. Europe follows closely, driven by robust government funding for scientific research and a strong presence of pharmaceutical and biotechnology companies. The Asia Pacific region is emerging as a high-growth market, fueled by increasing research activities, expanding healthcare infrastructure, and rising awareness about the benefits of SDMS solutions. Latin America and the Middle East & Africa are also witnessing steady growth, albeit at a slower pace, as organizations in these regions gradually embrace digital transformation and data-centric research practices.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset collection provides crop type dataset for consistent land cover classification in Central Asia. 8,196 samples were collected between 2015 and 2018, 213 in 2011 and 26 in 2008. The data compiles samples for 40 crop types and is dominated by cotton (40%) and wheat, (25%). The data went through validation process using expert knowledge and remote sensing data and relied on transferable, open-source workflows that will assure the consistency of future sampling campaigns. This dataset (crop type information) is essential to understand the spatial distribution of water usage and anticipate the risk of water scarcity and the consequent danger of food insecurity especially to arid regions such as the Aral Sea Basin (ASB), Central Asia, where agriculture relies heavily on irrigation. More information can be found in the documentation page "https://beta.source.coop/repositories/idiv/asia-crop-type/description">Source Cooperative.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionResearchers in biomedicine and public health often spend weeks locating, cleansing, and integrating data from disparate sources before analysis can begin. This redundancy slows discovery and leads to inconsistent pipelines.MethodsWe created BioBricks.ai, an open, centralized repository that packages public biological and chemical datasets as modular “bricks.” Each brick is a Data Version Control (DVC) Git repository containing an extract‑transform‑load (ETL) pipeline. A package‑manager–like interface handles installation, dependency resolution, and updates, while data are delivered through a unified backend (https://biobricks.ai).ResultsThe current release provides >90 curated datasets spanning genomics, proteomics, cheminformatics, and epidemiology. Bricks can be combined programmatically to build composite resources; benchmark use‑cases show that assembling multi‑dataset analytic cohorts is reduced from days to minutes compared with bespoke scripts.DiscussionBioBricks.ai accelerates data access, promotes reproducible workflows, and lowers the barrier for integrating heterogeneous public datasets. By treating data as version‑controlled software, the platform encourages community contributions and reduces redundant engineering effort. Continued expansion of brick coverage and automated provenance tracking will further enhance FAIR (Findable, Accessible, Interoperable, Reusable) data practices across the life‑science community.
Facebook
Twitter
According to our latest research, the global Data Versioning as a Service market size reached USD 1.14 billion in 2024, driven by the increasing demand for robust data management solutions across diverse industries. The market is set to expand at a CAGR of 21.8% from 2025 to 2033, with the forecasted market size expected to reach USD 8.85 billion by 2033. This remarkable growth is primarily attributable to the surging adoption of artificial intelligence, machine learning, and big data analytics, which require sophisticated data versioning frameworks to ensure data integrity, reproducibility, and compliance in enterprise environments.
The rapid proliferation of digital transformation initiatives is one of the most significant growth drivers for the Data Versioning as a Service market. Organizations across all sectors are increasingly generating and utilizing massive volumes of data, making it essential to maintain accurate records of data changes over time. Data versioning solutions enable enterprises to track, manage, and revert to previous data states, which is critical for auditing, troubleshooting, and regulatory compliance. The growing complexity of data pipelines, particularly in sectors such as BFSI, healthcare, and manufacturing, further underscores the necessity for scalable versioning solutions that can seamlessly integrate with existing data infrastructures. Furthermore, the emergence of data-centric business models and the continuous evolution of data governance policies are compelling organizations to invest in advanced data versioning services, fueling market expansion.
Another major growth factor is the increasing integration of machine learning and artificial intelligence into business processes. These technologies depend heavily on the availability of clean, versioned datasets for model training and validation. Data Versioning as a Service platforms facilitate the management of multiple data iterations, ensuring that data scientists and engineers can reproduce experiments and maintain model accuracy. As enterprises accelerate their AI adoption, the demand for reliable and scalable data versioning solutions is expected to surge. Additionally, the rise of DevOps practices, which emphasize collaboration and automation across development and operations teams, is driving the need for version-controlled data environments that support continuous integration and delivery workflows. This trend is particularly pronounced in IT, telecommunications, and technology-driven sectors, where agility and innovation are paramount.
Cloud adoption is another pivotal factor propelling the growth of the Data Versioning as a Service market. As businesses migrate their data infrastructures to cloud environments, they seek flexible and cost-effective solutions to manage data versions across distributed systems. Cloud-based data versioning services offer seamless scalability, enhanced security, and simplified management, making them attractive to enterprises of all sizes. The shift towards hybrid and multi-cloud strategies further amplifies the need for centralized data versioning platforms that can operate across diverse environments and support real-time collaboration. Moreover, the increasing emphasis on data privacy and regulatory compliance, particularly in regions with stringent data protection laws, is accelerating the adoption of managed data versioning services that provide comprehensive audit trails and automated compliance reporting.
From a regional perspective, North America currently dominates the Data Versioning as a Service market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The strong presence of leading technology providers, early adoption of cloud technologies, and a mature regulatory landscape contribute to North America's leadership position. Meanwhile, Asia Pacific is projected to exhibit the fastest growth over the forecast period, driven by rapid digitalization, expanding IT infrastructure, and increasing investments in artificial intelligence and analytics. Europe remains a key market due to its focus on data privacy and compliance, particularly under the General Data Protection Regulation (GDPR). Latin America and the Middle East & Africa are also witnessing steady growth, supported by rising awareness of data management best practices and growing investments in digital transformation initiatives.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Reference Data Management Platform market size reached USD 3.45 billion in 2024, reflecting robust expansion fueled by increasing digitization and data governance initiatives across industries. The market is projected to grow at a CAGR of 11.2% during the forecast period, with the market value anticipated to reach USD 9.20 billion by 2033. Primary growth factors include the rising need for data accuracy, regulatory compliance, and operational efficiency in data-driven organizations worldwide.
One of the key drivers propelling the growth of the Reference Data Management Platform market is the escalating volume and complexity of enterprise data. Organizations are increasingly recognizing the importance of accurate, consistent, and centralized reference data to support mission-critical business processes, analytics, and reporting. The proliferation of digital transformation initiatives, coupled with the adoption of cloud computing and big data analytics, has heightened the need for robust reference data management solutions. As enterprises integrate disparate data sources and legacy systems, the demand for platforms that ensure data consistency, reduce redundancies, and enhance data governance is surging. This trend is particularly pronounced in highly regulated sectors such as BFSI, healthcare, and government, where data integrity is paramount for compliance and risk management.
Another significant growth factor is the intensification of regulatory requirements across various industries. Regulatory bodies worldwide are imposing stringent mandates around data management, privacy, and reporting, compelling organizations to adopt advanced reference data management platforms. These platforms enable enterprises to maintain accurate and auditable records, automate compliance workflows, and mitigate operational risks associated with data discrepancies. The increasing frequency of regulatory updates, particularly in financial services and healthcare, is driving continuous investments in reference data management technologies. Furthermore, the growing emphasis on data democratization and self-service analytics is encouraging organizations to implement platforms that provide business users with easy access to trusted reference data, thereby accelerating decision-making and innovation.
The rapid evolution of artificial intelligence (AI) and machine learning (ML) technologies is also contributing to the expansion of the Reference Data Management Platform market. AI-powered data management solutions are enabling organizations to automate data matching, cleansing, and enrichment processes, significantly reducing manual efforts and operational costs. These intelligent platforms can identify data anomalies, recommend corrective actions, and enhance the overall quality of reference data. The integration of AI and ML capabilities is expected to drive further innovation in the market, empowering enterprises to derive actionable insights from complex data sets and unlock new business opportunities. As organizations strive to harness the full potential of their data assets, the adoption of next-generation reference data management platforms is set to accelerate in the coming years.
From a regional perspective, North America continues to dominate the Reference Data Management Platform market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The presence of leading technology vendors, early adoption of digital solutions, and a mature regulatory landscape are key factors contributing to the region’s leadership. However, Asia Pacific is emerging as a high-growth market, driven by rapid industrialization, expanding IT infrastructure, and increasing investments in data management technologies. Countries such as China, India, and Japan are witnessing significant demand for reference data management platforms, particularly in BFSI, healthcare, and manufacturing sectors. As organizations across regions prioritize data governance and compliance, the global market is poised for sustained growth throughout the forecast period.
The Component segment of the Reference Data Management Platform market is bifurcated into Software and Services, each playing a pivotal role in the overall market ecosystem. The software component encompasses core reference data management platform
Facebook
Twitter
According to our latest research, the global Engineering Simulation Data Management Software market size reached USD 1.85 billion in 2024, reflecting a robust trajectory driven by the increasing complexity of engineering projects and the need for seamless data integration. The market is poised to grow at a CAGR of 10.3% from 2025 to 2033, with the forecasted market size anticipated to touch USD 4.47 billion by 2033. The primary growth factor for this market is the rising adoption of digital transformation initiatives across industries, which has significantly increased the demand for advanced data management solutions that can handle the growing volume and complexity of simulation data.
A significant growth driver for the Engineering Simulation Data Management Software market is the escalating adoption of simulation-driven product development in industries such as automotive, aerospace, and healthcare. As product lifecycles shorten and the pressure for innovation intensifies, organizations are leveraging simulation tools to accelerate design, testing, and validation processes. This has led to a surge in the volume of simulation data, necessitating robust management platforms that can ensure data integrity, traceability, and accessibility across distributed teams. Additionally, the integration of simulation with other digital engineering tools has amplified the need for centralized data management, enabling organizations to achieve better collaboration, reduce redundancies, and maintain compliance with industry standards and regulations.
Another critical factor propelling market growth is the increasing complexity of engineering projects. Modern engineering simulations generate massive datasets that need to be managed efficiently for effective decision-making. The proliferation of multi-physics and multi-domain simulations, coupled with the trend towards digital twins and virtual prototyping, has further intensified the need for sophisticated data management solutions. Companies are now prioritizing the deployment of Engineering Simulation Data Management Software to streamline workflows, enhance productivity, and ensure that simulation data is readily available for analytics and reporting. This trend is particularly pronounced in sectors where safety, reliability, and regulatory compliance are paramount, such as aerospace & defense and healthcare.
The evolution of cloud computing and the shift towards cloud-based deployment models have also played a pivotal role in shaping the Engineering Simulation Data Management Software market. Cloud-based platforms offer unparalleled scalability, flexibility, and accessibility, making it easier for organizations to manage simulation data across global teams and locations. The ability to integrate with other enterprise systems, support for remote collaboration, and reduced IT overheads are some of the advantages driving the adoption of cloud-based solutions. This shift is enabling even small and medium enterprises to leverage advanced simulation data management capabilities, thereby democratizing access to cutting-edge engineering tools and fostering innovation across the value chain.
In the realm of engineering, Systems Engineering Software plays a pivotal role in managing the intricate web of processes involved in product development. This software is essential for coordinating various engineering disciplines, ensuring that all components of a system work harmoniously together. By integrating Systems Engineering Software with Engineering Simulation Data Management Software, organizations can enhance their ability to manage complex simulations and data flows. This integration facilitates better decision-making and improves the overall efficiency of engineering projects, particularly in industries where precision and reliability are critical.
Regionally, North America continues to dominate the Engineering Simulation Data Management Software market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The high concentration of technology-driven industries, presence of leading software vendors, and strong focus on R&D investments have contributed to the region's leadership. Europe, with its robust automotive and aerospace sectors, is also witnessing significant growth, wh
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modeling and analysis characteristics of included studies.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Field Adjuster Management Software market size reached USD 1.28 billion in 2024, reflecting the sector’s robust digital transformation and the increasing demand for efficient claims processing in the insurance industry. The market is projected to expand at a CAGR of 11.2% from 2025 to 2033, with the total market value expected to reach approximately USD 3.01 billion by 2033. This strong growth is primarily driven by the rising adoption of cloud-based solutions, the need for workflow automation, and the growing complexity of insurance claims management worldwide.
One of the key growth factors fueling the Field Adjuster Management Software market is the increasing digitization of insurance operations. As insurance companies and third-party administrators strive to enhance operational efficiency and customer satisfaction, there is a marked shift toward automation and digital transformation. Field adjuster management software streamlines the entire claims lifecycle—right from initial notification to settlement—by automating repetitive tasks, enabling real-time communication, and supporting mobile field operations. This reduces manual errors, shortens claims processing times, and allows insurance organizations to allocate resources more efficiently. In addition, the integration of AI and machine learning within these platforms is further elevating the accuracy and speed of claims assessment and fraud detection, making the adoption of such software indispensable in today’s competitive insurance landscape.
Another significant driver for market expansion is the increasing complexity and frequency of insurance claims, especially in the context of natural disasters, pandemics, and evolving regulatory requirements. The ability of Field Adjuster Management Software to provide comprehensive reporting, analytics, and workflow automation is central to managing these challenges. Insurance companies are investing heavily in technology that not only accelerates claims processing but also enhances transparency and compliance. The software’s capability to centralize data, track adjuster performance, and generate actionable insights from large datasets is proving crucial in mitigating risks and improving overall organizational agility. Furthermore, the growing trend of remote work and the need for seamless communication between field adjusters and back-office staff have made cloud-based solutions increasingly attractive, supporting scalability and business continuity.
The market is also witnessing a surge in demand from small and medium enterprises (SMEs) and independent adjusters who are seeking cost-effective, scalable solutions to compete with larger players. The availability of modular, subscription-based software has lowered the barrier to entry for these end-users, allowing them to leverage advanced functionalities such as mobile claim submissions, automated workflow management, and integrated communication tools. This democratization of technology is fostering a more level playing field in the insurance ecosystem, enabling smaller firms to deliver superior customer experiences and remain compliant with industry standards. Additionally, the growing emphasis on customer-centricity and personalized service delivery is prompting insurers to adopt sophisticated field adjuster management solutions that can support tailored workflows and rapid response times.
Regionally, North America continues to dominate the Field Adjuster Management Software market owing to its mature insurance sector, high digital adoption rates, and significant investments in insurtech innovation. However, the Asia Pacific region is emerging as a high-growth market, driven by expanding insurance penetration, regulatory reforms, and the increasing prevalence of mobile technology. Europe is also witnessing steady growth, particularly in countries with advanced insurance frameworks and stringent compliance requirements. Meanwhile, Latin America and the Middle East & Africa are expected to register moderate growth, supported by gradual modernization of insurance processes and growing awareness of the benefits of digital claims management solutions.
The Component segment of the Field Adjuster Management Software market is bifurcated into software and services, each playing a pivotal role in the overall ecosystem. The software segment holds the largest market shar
Facebook
TwitterA data-check of the reported seed densities for Central California species (native and exotic) in experimentation.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Data Collection and Analysis
We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.
Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.
All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.
Our reproducibility pipeline was started on 27 March 2023.
Repository Structure
Our repository is organized into two main folders:
Accessing Data and Resources:
System Requirements:
Running the pipeline:
Running the analysis:
References: