Facebook
TwitterThe NIDDK Central Repository stores biosamples, genetic and other data collected in designated NIDDK-funded clinical studies. The purpose of the NIDDK Central Repository is to expand the usefulness of these studies by allowing a wider research community to access data and materials beyond the end of the study.
Facebook
TwitterNIDDK Central Repositories are two separate contract funded components that work together to store data and samples from significant, NIDDK funded studies. First component is Biorepository that gathers, stores, and distributes biological samples from studies. Biorepository works with investigators in new and ongoing studies as realtime storage facility for archival samples.Second component is Data Repository that gathers, stores and distributes incremental or finished datasets from NIDDK funded studies Data Repository helps active data coordinating centers prepare databases and incremental datasets for archiving and for carrying out restricted queries of stored databases. Data Repository serves as Data Coordinating Center and website manager for NIDDK Central Repositories website.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The world's largest centralized repository of llms.txt files from websites worldwide. Contains AI training guidelines and content policies from major websites.
Facebook
TwitterDescription and objectives of National Centralized Storage Facility
Facebook
TwitterTeen-LABS conducted coordinated clinical, epidemiological, and behavioral research focused on adolescent bariatric surgery. The study developed common clinical protocols and a bariatric surgery database for the purpose of collecting information from participating clinical centers that performed bariatric surgery on teenagers. Outside of investigating surgical outcomes, Teen-LABS sought to better understand the etiology, pathophysiology, and behavioral aspects of severe obesity in youth as well as how severe obesity impacts humans over time. Participants were recruited from six clinical centers and underwent bariatric surgery. Pre- and post-surgery data and biospecimens were obtained at pre-determined points.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Maven dependency graph is an open dataset of Maven Central artifacts, their dependencies, as well as other relationships. Its main intent is to domesticate the wild within and around the Maven central ecosystem, in particular, and JVM-based libraries at large, making it more harnessable to both academics and industry. It is intended to answer high-level research questions concerning artifacts releases, evolution, and usage trends over time. It can also be used to assist researchers in selecting relevant datasets, among the mass of existing software artifact, for assessing particular empirical software engineering challenges. The complexity of these questions can range from simple pattern matching to advanced big data analysis and machine learning techniques.
The accompanying paper to this dataset is has been accepted for publication in the proceedings of the International Conference on Mining Software Repositories 2019 and has received the MSR 2019 Data Showcase Award. This paper is available for download on arXiv.
Facebook
TwitterThe goal of BioLINCC is to facilitate and coordinate the existing activities of the NHLBI Biorepository and the Data Repository and to expand their scope and usability to the scientific community through a single web-based user interface.
Facebook
TwitterThe files in this dataset are for Edmonton Neighbourhood Crime Stats and Population Figures. These files were merged and used to calculate crime rates for the various types of incidents.
Both files were downloaded from the City of Edmonton Open Data Portal and estimated population figures were obtained from various independent sources for the missing years.
To analyze crime/policy data from the City of Edmonton to identify initiatives that have had success in reducing various crime rates. To also look at the culture/society of Edmonton to analyze if that contributes to increased/decreased crime rates.
Facebook
Twitter
According to the latest research conducted in early 2025, the global Unified Data Repository Market size reached USD 8.3 billion in 2024, demonstrating robust momentum driven by the accelerating need for comprehensive data management solutions across diverse industries. The market is projected to expand at a CAGR of 13.5% from 2025 to 2033, ultimately attaining a forecasted value of USD 26.3 billion by the end of 2033. This remarkable growth is primarily fueled by the increasing complexity of enterprise data ecosystems, rising regulatory compliance demands, and the surge in digital transformation initiatives worldwide.
One of the primary growth factors propelling the Unified Data Repository Market is the exponential increase in data volumes generated by organizations. As businesses transition towards digital operations, the need for centralized, scalable, and easily accessible data repositories has become paramount. Enterprises are no longer dealing with siloed data sources; instead, they require unified platforms that can seamlessly integrate structured and unstructured data, ensuring real-time access and optimal data quality. The proliferation of IoT devices, cloud-based applications, and edge computing has further intensified the need for unified data repositories that can consolidate disparate data streams, enabling organizations to derive actionable insights and maintain a competitive edge.
Another significant driver is the growing emphasis on regulatory compliance and data governance. With stringent data privacy regulations such as GDPR, CCPA, and other region-specific mandates, organizations are under immense pressure to maintain transparency, ensure data lineage, and safeguard sensitive information. Unified data repositories offer a comprehensive framework that facilitates compliance by providing centralized control, robust audit trails, and granular access management. This is particularly critical for sectors like BFSI, healthcare, and government, where data breaches and non-compliance can result in substantial financial and reputational damage. The market is also witnessing increased investments in advanced analytics and artificial intelligence, which are further enhancing the capabilities of unified data repository solutions.
The rapid adoption of cloud technologies and the rise of hybrid IT environments are also contributing significantly to market growth. Organizations are increasingly leveraging cloud-based unified data repositories to achieve greater scalability, flexibility, and cost efficiency. Cloud deployment models enable seamless integration with existing IT infrastructure, support remote access, and facilitate real-time collaboration across geographically dispersed teams. Moreover, the shift towards cloud-native architectures is enabling vendors to offer innovative features such as automated data discovery, intelligent data cataloging, and self-service analytics, thereby expanding the addressable market and attracting a broader customer base.
In the context of the Unified Data Repository Market, the role of the Home Subscriber Server (HSS) is becoming increasingly significant, especially in the telecommunications sector. The HSS is a central database that contains subscriber-related information and plays a crucial role in managing user profiles, authentication, and mobility management. As telecom operators transition to 5G networks, the integration of HSS with unified data repositories is essential to ensure seamless data flow and real-time access to subscriber information. This integration enhances the ability of telecom companies to deliver personalized services, optimize network resources, and maintain high levels of customer satisfaction. The growing demand for efficient data management solutions in the telecom industry is driving the adoption of unified data repositories that incorporate HSS functionalities, enabling operators to streamline operations and improve service delivery.
From a regional perspective, North America continues to dominate the Unified Data Repository Market owing to its advanced technological landscape, high digital adoption rates, and significant investments in data-driven initiatives. However, the Asia Pacific region is poised for the fastest growth, driven by rapid digitalization, expanding enterprise IT infrastructure, and support
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Maven Central Dependency Graph
This is an updated version of the artifact at https://zenodo.org/record/1489120
The Maven dependency graph is an open dataset of Maven Central artifacts, their dependencies, as well as other relationships. Its main intent is to domesticate the wild within and around the Maven central ecosystem, in particular, and JVM-based libraries at large, making it more harnessable to both academics and industry. It is intended to answer high-level research questions concerning artifacts releases, evolution, and usage trends over time. It can also be used to assist researchers in selecting relevant datasets, among the mass of existing software artifact, for assessing particular empirical software engineering challenges. The complexity of these questions can range from simple pattern matching to advanced big data analysis and machine learning techniques.
The accompanying paper to this dataset is has been accepted for publication in the proceedings of the International Conference on Mining Software Repositories 2019 and has received the MSR 2019 Data Showcase Award. This paper is available for download on arXiv.
What is new?
The previous version included artifacts until September 6, 2018.
This version includes artifacts until September 10, 2019.
This version includes license information as well as information about associated code repository.
This version contains 4 201 392 artifacts (version) of 308116 distinct libraries from 47481 distinct group IDs.
Note 33 638 artifacts represents version ranges and note actual versions. They can be filtered out by excluding version containing ','.
Usage
Usage:
# Pull the image and start the container
docker run -d --name mm-neo4j -p 7474:7474 -p 7687:7687 -v /path/to/neo4j-data:/data --env=NEO4J_dbms_memory_heap_max_size=8g lyadis/mm-neo4j:latest
Facebook
Twitterhttps://choosealicense.com/licenses/pddl/https://choosealicense.com/licenses/pddl/
PubMed Central Figures Dataset
This dataset contains image-text pairs extracted from figures from papers in the PubMed Central repository. The dataset can be used to train CLIP models. This repo contains contains a Parquet file containing the metadata of a WebDataset in img2dataset format. The images themselves are not distributed and need to be retrieved. Note that the images cannot be retrieved by an HTTP URL, so img2dataset cannot be used as is to retrieve the data. Instead, the… See the full description on the dataset page: https://huggingface.co/datasets/nopperl/pmc-image-text.
Facebook
TwitterCentral repository for information about past and present members of the Council.
Facebook
TwitterRepository that serves to coordinate searches across data and biospecimen collections from participants in numerous clinical trials and epidemiologic studies and to provide an electronic means for requests for additional information and the submission of requests for collections. The collections, comprising data from more than 80 trials or studies and millions of biospecimens, are available to qualified investigators under specific terms and conditions consistent with the informed consents provided by the individual study participants. Some datasets are presented with studies and supporting materials to facilitate their use in reuse and teaching. Datasets support basic research, clinical studies, observational studies, and demonstrations. Researchers wishing to apply to submit biospecimen collections to the NHLBI Biorepository for sharing with qualified investigators may also use this website to initiate that process.
Facebook
TwitterData repository that is hosting medical images.
Facebook
TwitterAt the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the dataset for the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Data Collection and Analysis
We use the code for reproducibility of Jupyter notebooks from the study done by Pimentel et al., 2019 and adapted the code from ReproduceMeGit. We provide code for collecting the publication metadata from PubMed Central using NCBI Entrez utilities via Biopython.
Our approach involves searching PMC using the esearch function for Jupyter notebooks using the query: ``(ipynb OR jupyter OR ipython) AND github''. We meticulously retrieve data in XML format, capturing essential details about journals and articles. By systematically scanning the entire article, encompassing the abstract, body, data availability statement, and supplementary materials, we extract GitHub links. Additionally, we mine repositories for key information such as dependency declarations found in files like requirements.txt, setup.py, and pipfile. Leveraging the GitHub API, we enrich our data by incorporating repository creation dates, update histories, pushes, and programming languages.
All the extracted information is stored in a SQLite database. After collecting and creating the database tables, we ran a pipeline to collect the Jupyter notebooks contained in the GitHub repositories based on the code from Pimentel et al., 2019.
Our reproducibility pipeline was started on 27 March 2023.
Repository Structure
Our repository is organized into two main folders:
Accessing Data and Resources:
System Requirements:
Running the pipeline:
Running the analysis:
References:
Facebook
TwitterThe Bangladesh Directorate of Secondary and Higher Education seeks consulting services to develop quality e-contents, central model classes, and a central repository of e-learning materials. This assignment is part of the Learning Acceleration in Secondary Education project funded by the World Bank. The project aims to enhance digital learning resources for secondary education across Bangladesh.
Facebook
TwitterThis dataset tracks the updates made on the dataset "Adams Elementary School ( Central Valley)" as a repository for previous versions of the data and metadata.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
OverdoseFreePA OverdoseFreePA is made possible by the Pennsylvania Commission on Crime and Delinquency, and is directed and managed by the Pennsylvania Overdose Reduction Technical Assistance Center (TAC), University of Pittsburgh School of Pharmacy. The website is a result of collaboration with county and state partners across the Commonwealth of Pennsylvania.
Our partnerships include:
Pennsylvania District Attorneys Association Pennsylvania Medical Society Pennsylvania Pharmacist Association Pennsylvania Psychiatric Society The Hospital and Healthsystem Association of Pennsylvania Pennsylvania Dental Association Drug Enforcement Administration 360 Strategy There are a growing number of Pennsylvania counties involved in ramping up overdose prevention, treatment, and recovery activities to address the opioid overdose epidemic. The counties involved are collaborating to develop resources that can be used by all Pennsylvanians to increase community awareness and knowledge of overdose and overdose prevention strategies as well as to support initiatives aimed at decreasing drug overdoses and deaths within the participating counties. As a centralized resource and technical assistance hub, OverdoseFreePA is a central repository for these efforts to facilitate increased treatment and prevention efforts in these communities.
Pennsylvania Opioid Overdose Reduction Technical Assistance Center (TAC) Pennsylvania, and the nation at large, is in the midst of opioid overdose epidemic. The TAC’s vision is to lead Pennsylvania communities to zero overdoses.The TAC hopes to achieve this vision by providing concierge technical assistance in the form of data driven recommendations and customized strategic planning to counties working to eliminate overdoses. The TAC strives to lead the field in identifying and sharing strategies to eliminate overdose through the central repository of OverdoseFreePA.
Based out of the Program Evaluation and Research Unit (PERU) at the University of Pittsburgh’s School of Pharmacy, the TAC assists counties and communities in assessing needs, building capacity to address the needs, developing and implementing data driven plans with high quality outcomes, and sustaining initiatives to eliminate overdoses, both fatal and non-fatal, throughout Pennsylvania.
More information here -http://www.overdosefreepa.pitt.edu/who-we-are/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the dataset referenced in the Scientific Data journal article titled "Aerial Imagery-Derived Dataset of Manufactured Housing Communities in the North Central United States" by Armin Yeganeh, Maria Marshall, and Noah Durst. The associated code scripts are available at https://github.com/arminyeganeh/mhc
Facebook
TwitterThe NIDDK Central Repository stores biosamples, genetic and other data collected in designated NIDDK-funded clinical studies. The purpose of the NIDDK Central Repository is to expand the usefulness of these studies by allowing a wider research community to access data and materials beyond the end of the study.