Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
datasets metrics
This dataset contains metrics about the huggingface/datasets package. Number of repositories in the dataset: 4997 Number of packages in the dataset: 215
Package dependents
This contains the data available in the used-by tab on GitHub.
Package & Repository star count
This section shows the package and repository star count, individually.
Package Repository
There are 22 packages that have more than 1000 stars. There are 43… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/datasets-dependents.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about various open-source pre-trained models that are available on Kaggle. These models can be used for various machine learning and deep learning tasks such as image classification, natural language processing, object detection, etc. The dataset has the following features:
The dataset can be useful for anyone who wants to explore different pre-trained models and compare their performance and features. It can also help in finding suitable models for specific problems or domains.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
People Open Source is a dataset for object detection tasks - it contains Person annotations for 1,500 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Market research dataset covering growth of the global open-source software market, including benefits, adoption, and enterprise usage in 2025.
Facebook
TwitterCode and data to reproduce the results and datasets from "Tools for Open Source, Subnational CGE Modeling with an Illustrative Analysis of Carbon Leakage" by Andrew Schreiber and Thomas F. Rutherford, in the Journal of Global Economic Analysis. Citation information for this dataset can be found in the EDG's Metadata Reference Information section and Data.gov's References section.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.
Global Open-Source Database Software Market Drivers
The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:
Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.
For more details see the included README file and companion paper:
Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.
If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
Facebook
TwitterAs of June 2024, the most popular open-source database management system (DBMS) in the world was MySQL, with a ranking score of ****. Oracle was the most popular commercial DBMS at that time, with a ranking score of ****.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:
The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.
The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].
[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.
[2] Full paper will be made available open access in the conference proceedings.
Facebook
TwitterAt the end of 2022, there were approximately *** million JavaScript open source projects in the Maven Central Repository and around ** million JavaScript project versions worldwide. While JavaScript is the largest ecosystem in the Maven Central Repository, Java, Python, and .NET also have thousands of available open source projects.
Facebook
TwitterThese data and code successfully reproduce nearly all cross-sectional stock return predictors. The 319 characteristics draw from previous meta-studies, but authors differ by comparing their t-stats to the original papers' results. For the 161 characteristics that were clearly significant in the original papers, 98% of their long-short portfolios find t-stats above 1.96. For the 44 characteristics that had mixed evidence, authors' reproductions find t-stats of 2 on average. A regression of reproduced t-stats on original longshort t-stats finds a slope of 0.90 and an R2 of 83%. Mean returns aremonotonic in predictive signals at the characteristic level. The remaining 114 characteristics were insignificant in the original papers or are modifications of the originals created by Hou, Xue, and Zhang (2020). These remaining characteristics are almost always significant if the original characteristic was also significant.
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
The Dark Side of Openness: How Open Source Data Can Be Abused to Harm Human Life
First draft partially generated using Perplexity AI, then written and edited manually and revised using agentlans/granite-3.3-2b-reviser. Open-source data, a vast resource for innovation and collaboration, offers significant benefits. However, the same openness that empowers progress can also create serious risks. The potential for harm arises when personal and sensitive data is exposed, potentially… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/open-source-data-abuse.
Facebook
TwitterThis dataset lists out all software in use by NASA
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
TwitterAs of 2024 Java is the language that leads the list of average number of open source versions released per project, with **, which is followed closely by .NET which has ** open source versions released per project.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Open Source Time Series Database market was valued at USD XXX million in 2023 and is projected to reach USD XXX million by 2032, with an expected CAGR of XX% during the forecast period.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The size of the Open-Source Database Software market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX % during the forecast period.
Facebook
TwitterThe goal of the Open Source Indicators (OSI) Program was to make automated predictions of significant societal events through the continuous and automated analysis of publicly available data such as news media, social media, informational websites, and satellite imagery. Societal events of interest included civil unrest, disease outbreaks, and election results. Geographic areas of interest include countries in Latin America (LA) and the Middle East and North Africa (MENA). The handbook is intended to serve as a reference document for the OSI Program and a companion to the ground truth event data used for test and evaluation. The handbook provides guidance regarding the types of events considered; the submission of automated predictions or “warnings;” the development of ground truth; the test and evaluation of submitted warnings; performance measures; and other programmatic information. IARPA initiated a solicitation for OSI Research Teams in late summer 2011 for one base year and two option years of research. MITRE was selected as the Test and Evaluation (T&E) Team in November 2011. Following a review of proposals, three teams (BBN, HRL, and Virginia Tech (VT)) were selected. The OSI Program officially began in April 2012; manual event encoding and formal T&E ended in March 2015.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
datasets metrics
This dataset contains metrics about the huggingface/datasets package. Number of repositories in the dataset: 4997 Number of packages in the dataset: 215
Package dependents
This contains the data available in the used-by tab on GitHub.
Package & Repository star count
This section shows the package and repository star count, individually.
Package Repository
There are 22 packages that have more than 1000 stars. There are 43… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/datasets-dependents.