Facebook
TwitterMr-Vicky-01/Repository-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset lists over 215k top projects by star with over 167 stars. Contains a lot of useful information (attributes).
I collected this dataset using github search api. This allows you to get only the first thousand for a query, so I looped through the low/high (stars) pairs that return less than a thousand repositories when query=stars:{low}..{high}.
The Github API Terms of Service apply.
You may not use this dataset for spamming purposes, including for the purposes of selling GitHub users' personal information, such as to recruiters, headhunters, and job boards.
| Column name | Description |
|---|---|
| Name | The name of the GitHub repository |
| Description | A brief textual description that summarizes the purpose or focus of the repository |
| URL | The URL or web address that links to the GitHub repository, which is a unique identifier for the repository |
| Created At | The date and time when the repository was initially created on GitHub, in ISO 8601 format |
| Updated At | The date and time of the most recent update or modification to the repository, in ISO 8601 format |
| Homepage | The URL to the homepage or landing page associated with the repository, providing additional information or resources |
| Size | The size of the repository in bytes, indicating the total storage space used by the repository's files and data |
| Stars | The number of stars or likes that the repository has received from other GitHub users, indicating its popularity or interest |
| Forks | The number of times the repository has been forked by other GitHub users |
| Issues | The total number of open issues |
| Watchers | The number of GitHub users who are "watching" or monitoring the repository for updates and changes |
| Language | The primary programming language |
| License | Information about the software license using a license identifier |
| Topics | A list of topics or tags associated with the repository, helping users discover related projects and topics of interest |
| Has Issues | A boolean value indicating whether the repository has an issue tracker enabled. In this case, it's true, meaning it has an issue tracker |
| Has Projects | A boolean value indicating whether the repository uses GitHub Projects to manage and organize tasks and work items |
| Has Downloads | A boolean value indicating whether the repository offers downloadable files or assets to users |
| Has Wiki | A boolean value indicating whether the repository has an associated wiki with additional documentation and information |
| Has Pages | A boolean value indicating whether the repository has GitHub Pages enabled, allowing the creation of a website associated with the repository |
| Has Discussions | A boolean value indicating whether the repository has GitHub Discussions enabled, allowing community discussions and collaboration |
| Is Fork | A boolean value indicating whether the repository is a fork of another repository. In this case, it's false, meaning it is not a fork |
| Is Archived | A boolean value indicating whether the repository is archived. Archived repositories are typically read-only and are no longer actively maintained |
| Is Template | A boolean value indicating whether the repository is set up as a template |
| Default Branch | The name of the default branch |
Facebook
TwitterThe NSF Public Access Repository contains an initial collection of journal publications and the final accepted version of the peer-reviewed manuscript or the version of record. To do this, NSF draws upon services provided by the publisher community including the Clearinghouse of Open Research for the United States, CrossRef, and International Standard Serial Number. When clicking on a Digital Object Identifier number, you will be taken to an external site maintained by the publisher. Some full text articles may not be available without a charge during the embargo, or administrative interval. Some links on this page may take you to non-federal websites. Their policies may differ from this website.
Facebook
Twitterhttps://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVHhttps://borealisdata.ca/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=doi:10.5683/SP3/UPABVH
Data collected from major Canadian and international research data repositories cover data storage, preservation, metadata, interchange, data file types, and other standard features used in the retention and sharing of research data. The outputs of this project primarily aim to assist in the establishment of recommended minimum requirements for a Canadian research data infrastructure. The committee also aims to further develop guidelines and criteria for the assessment and selection o f repositories for deposit of Canadian research data by researchers, data managers, librarians, archivists etc.
Facebook
TwitterGitHub is how people build software and is home to the largest community of open source developers in the world, with over 12 million people contributing to 31 million projects on GitHub since 2008.
This 3TB+ dataset comprises the largest released source of GitHub activity to date. It contains a full snapshot of the content of more than 2.8 million open source GitHub repositories including more than 145 million unique commits, over 2 billion different file paths, and the contents of the latest revision for 163 million files, all of which are searchable with regular expressions.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.github_repos.[TABLENAME]. Fork this kernel to get started to learn how to safely manage analyzing large BigQuery datasets.
This dataset was made available per GitHub's terms of service. This dataset is available via Google Cloud Platform's Marketplace, GitHub Activity Data, as part of GCP Public Datasets.
Facebook
TwitterThis project has built a repository of items (www.esmitemrepository.com) used in experience sampling method (ESM), ecological momentary assessment (EMA) and ambulatory assessment (AA) studies. The idea for this repository arose out of discussions during the Open Science hackathon at the 2018 Belgian-Dutch ESM Network Meeting.
In order to contribute items to the repository, you will need to download all five documents in the Contributors' Pack. When you have downloaded the ESM Item Repository submission template (spreadsheet) document, you can enter your items into it and then send it back to us via email (submissions [at] esmitemrepository.com). We will then collate all the submitted items into a repository and publish them here.
If you would like to browse the full repository and download items and their information, visit www.esmitemrepository.com.
Facebook
TwitterThe NIH Common Data Elements (CDE) Repository has been designed to provide access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes. Visit the NIH CDE Resource Portal for contextual information about the repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Availability of data, code, and plot creation for various figures throughout my PhD thesis. Rough organisation currently. Pertains to Figures 5.4, 5.8, 6.11, 6.18, 7.3, 7.12, and Table 6.1.
Facebook
TwitterThe Administrative Data Repository (ADR) was established to provide support for the administrative data elements relative to multiple categories of a person entity such as demographic and eligibility information. Although initially focused on the computing needs of the Veterans Health Administration, the ADR is positioned to provide identity management and demographics support for all IT systems within the Department of Veterans Affairs.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global clinical trial data repository market size was estimated to be approximately $1.8 billion in 2023 and is projected to grow at a compound annual growth rate (CAGR) of 9.5% to reach around $4.1 billion by 2032. The primary growth factors include the increasing volume and complexity of clinical trials, rising need for efficient data management systems, and stringent regulatory requirements for data accuracy and integrity. The advent of advanced technologies such as artificial intelligence and big data analytics further drives market expansion by enhancing data processing capabilities and providing actionable insights.
The growth of the clinical trial data repository market is significantly influenced by the increasing number of clinical trials being conducted globally. With the rise in chronic diseases, the need for innovative treatments and therapies has surged, leading to an upsurge in clinical trials. This increase in clinical trials necessitates robust data management systems to handle vast amounts of data generated, thereby propelling the demand for clinical trial data repositories. Moreover, the complexity of modern clinical trials, which often involve multiple sites and diverse patient populations, further amplifies the need for sophisticated data management solutions.
Another critical driver for the market is the stringent regulatory landscape governing clinical trial data. Regulatory bodies such as the FDA, EMA, and other local authorities mandate rigorous data management standards to ensure data integrity, accuracy, and accessibility. These regulations necessitate the adoption of advanced data repository systems that can comply with regulatory requirements, thereby fueling market growth. Additionally, regulatory frameworks are becoming increasingly stringent, prompting pharmaceutical and biotechnology companies to invest in state-of-the-art data management systems to avoid compliance issues and potential financial penalties.
Technological advancements play a pivotal role in the market's growth. The integration of artificial intelligence, machine learning, and big data analytics into data repository systems enhances data processing and analysis capabilities. These technologies enable real-time data monitoring, predictive analytics, and improved decision-making, thereby improving the efficiency of clinical trials. Furthermore, the shift towards cloud-based solutions offers scalability, flexibility, and cost-effectiveness, making advanced data management systems accessible to even small and medium-sized enterprises.
Regionally, North America dominates the clinical trial data repository market owing to its robust healthcare infrastructure, high R&D investments, and presence of major pharmaceutical and biotechnology companies. Europe follows closely due to stringent regulatory standards and a strong focus on clinical research. The Asia Pacific region is expected to witness the highest growth rate during the forecast period due to increasing clinical trial activities, growing healthcare expenditure, and the rising adoption of advanced technologies. Latin America and the Middle East & Africa are also likely to experience growth, albeit at a slower pace, driven by improving healthcare systems and increasing focus on clinical research.
The clinical trial data repository market is segmented by components into software and services. The software segment is anticipated to hold a significant share of the market due to the essential role software plays in data management. Advanced software solutions offer capabilities such as data storage, management, retrieval, and analysis, which are critical for effective clinical trial management. The integration of AI and machine learning algorithms into these software systems further enhances their efficiency by enabling predictive analytics and real-time monitoring, thus driving the software segment's growth.
Software solutions in clinical trial data repositories also offer interoperability, enabling seamless integration with other clinical trial management systems (CTMS) and electronic data capture (EDC) systems. This interoperability is crucial for ensuring data consistency and accuracy across different platforms, thereby enhancing overall data management. Additionally, the increasing adoption of cloud-based software solutions provides scalability, cost-effectiveness, and remote acce
Facebook
TwitterDigital Repository for Open Access to University of Luxembourg publications.
ORBilu was officially launched on the 22nd April 2013. The acronym ORBi stands for “Open Repository and Bibliography”. It also expresses the Latin word “orbi” (“for the world”) and signals the will of the University to make its academic research available to everyone, without barriers, be they legal, financial or technical. By keeping the ORBi name and adding “lu”, the University of Luxembourg wants to show its appreciation for the work done by the University of Liège but also clearly indicates that this is a version adapted to the UL context.
The API format is described at https://www.openarchives.org/pmh/.
Facebook
TwitterA database which contains longitudinal structural MRIs, spectroscopy, DTI and correlated clinical/behavioral data from approximately 500 healthy, normally developing children, ages newborn to young adult.
Facebook
TwitterThe goal of BioLINCC is to facilitate and coordinate the existing activities of the NHLBI Biorepository and the Data Repository and to expand their scope and usability to the scientific community through a single web-based user interface.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was extracted from a set of metadata files harvested from the DataCite metadata store (http://search.datacite.org/ui) during December 2015. Metadata records for items with a resourceType of dataset were collected. 1,647,949 total records were collected. This dataset contains four files: 1) readme.txt: a readme file. 2) language-results.csv: A CSV file containing three columns: DOI, DOI prefix, and language text contents 3) language-counts.csv: A CSV file containing counts for unique language text content values. 4) language-grouped-counts.txt: A text file containing the results of manually grouping these language codes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Annex to Deliverable 3.2 of project RISEnergy. Contains lists of: Metadata platforms, Data repository sites and Database services relevant to 10 renewabl energy sectors of concern by RISEnergy
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the unified data repository market size reached USD 8.4 billion in 2024 on a global scale. The market is witnessing robust momentum, driven by the exponential growth of enterprise data and the need for streamlined data management solutions. The market is projected to expand at a notable CAGR of 14.7% during the forecast period, with the total value anticipated to reach USD 26.2 billion by 2033. This significant growth trajectory is underpinned by the increasing adoption of cloud-based solutions, the proliferation of big data analytics, and a growing emphasis on regulatory compliance and data governance across various industries.
One of the primary growth factors propelling the unified data repository market is the relentless surge in data volumes generated by organizations across all sectors. With the proliferation of digital transformation initiatives, enterprises are experiencing unprecedented data growth, originating from diverse sources such as IoT devices, customer interactions, business operations, and social media. Managing, integrating, and extracting value from this deluge of data has become a strategic imperative. Unified data repositories offer a centralized platform that enables organizations to consolidate disparate data silos, improve data accessibility, and enhance decision-making capabilities. As businesses increasingly recognize the value of data-driven insights, the demand for robust unified data repository solutions is set to accelerate further.
Another critical driver for the unified data repository market is the growing need for compliance with stringent data protection and privacy regulations. Regulatory frameworks such as GDPR in Europe, CCPA in California, and other local data governance mandates require organizations to maintain high levels of data integrity, security, and transparency. Unified data repositories facilitate centralized control and monitoring of data assets, ensuring that organizations can efficiently manage data lineage, access controls, and audit trails. This capability not only helps mitigate compliance risks but also fosters trust among stakeholders and customers. Consequently, sectors such as BFSI, healthcare, and government are increasingly investing in unified data repository solutions to uphold regulatory standards and safeguard sensitive information.
Technological advancements and the integration of artificial intelligence (AI) and machine learning (ML) capabilities are further enhancing the value proposition of unified data repositories. Modern solutions are equipped with advanced analytics, automated data classification, and intelligent data integration features that empower organizations to derive actionable insights from their data assets. The ability to seamlessly integrate with existing IT infrastructure and support multi-cloud deployments is also a key differentiator. These technological innovations are enabling organizations to unlock new business opportunities, optimize operational efficiency, and gain a competitive edge in the digital economy. As a result, the unified data repository market is experiencing heightened adoption across both large enterprises and small and medium-sized enterprises (SMEs).
From a regional perspective, North America continues to dominate the unified data repository market, accounting for the largest revenue share in 2024. The region’s leadership is attributed to the high concentration of technology-driven enterprises, early adoption of advanced data management solutions, and a mature regulatory environment. However, Asia Pacific is emerging as the fastest-growing region, fueled by rapid digitalization, expanding IT infrastructure, and increasing investments in cloud technologies. Europe remains a significant market, driven by stringent data protection regulations and strong demand from the BFSI and healthcare sectors. The Middle East & Africa and Latin America are also witnessing steady growth, supported by rising awareness of data management best practices and ongoing digital transformation initiatives.
The unified data repository market is segmented by component into software, hardware, and services, each playing a crucial role in the overall ecosystem. The software segment holds the largest share, driven by the widespread adoption of advanced data management platforms that enable seamless integration, storage, and retriev
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The zip files contains 12338 datasets for outlier detection investigated in the following papers:(1) Instance space analysis for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Kate Smith-Miles (2) On normalization and algorithm selection for unsupervised outlier detection Authors : Sevvandi Kandanaarachchi, Mario A. Munoz, Rob J. Hyndman, Kate Smith-MilesSome of these datasets were originally discussed in the paper: On the evaluation of unsupervised outlier detection:measures, datasets and an empirical studyAuthors : G. O. Campos, A, Zimek, J. Sander, R. J.G.B. Campello, B. Micenkova, E. Schubert, I. Assent, M.E. Houle.
Facebook
TwitterCollection of databases, domain theories, and data generators that are used by machine learning community for empirical analysis of machine learning algorithms. Datasets approved to be in the repository will be assigned Digital Object Identifier (DOI) if they do not already possess one. Datasets will be licensed under a Creative Commons Attribution 4.0 International license (CC BY 4.0) which allows for the sharing and adaptation of the datasets for any purpose, provided that the appropriate credit is given
Facebook
TwitterBRADS is a repository for data and biospecimens from population health research initiatives and clinical or interventional trials designed and implemented by NICHD’s Division of Intramural Population Health Research (DIPHR). Topics include human reproduction and development, pregnancy, child health and development, and women’s health. The website is maintained by DIPHR.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by salhi rahma
Released under Attribution 4.0 International (CC BY 4.0)
Facebook
TwitterMr-Vicky-01/Repository-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community