Facebook
TwitterThis dataset lists out all software in use by NASA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Market research dataset covering growth of the global open-source software market, including benefits, adoption, and enterprise usage in 2025.
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
The Dark Side of Openness: How Open Source Data Can Be Abused to Harm Human Life
First draft partially generated using Perplexity AI, then written and edited manually and revised using agentlans/granite-3.3-2b-reviser. Open-source data, a vast resource for innovation and collaboration, offers significant benefits. However, the same openness that empowers progress can also create serious risks. The potential for harm arises when personal and sensitive data is exposed, potentially… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/open-source-data-abuse.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:
The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.
The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].
[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.
[2] Full paper will be made available open access in the conference proceedings.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The presentation explains in the simplest possible way what you need to know about open source licenses when starting from scratch. It also sums up the course "Open Source Licensing Basics for Software Developers (LFC191)" (Linux Foundation)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.
For more details see the included README file and companion paper:
Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.
If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
Facebook
TwitterThese data and code successfully reproduce nearly all cross-sectional stock return predictors. The 319 characteristics draw from previous meta-studies, but authors differ by comparing their t-stats to the original papers' results. For the 161 characteristics that were clearly significant in the original papers, 98% of their long-short portfolios find t-stats above 1.96. For the 44 characteristics that had mixed evidence, authors' reproductions find t-stats of 2 on average. A regression of reproduced t-stats on original longshort t-stats finds a slope of 0.90 and an R2 of 83%. Mean returns aremonotonic in predictive signals at the characteristic level. The remaining 114 characteristics were insignificant in the original papers or are modifications of the originals created by Hou, Xue, and Zhang (2020). These remaining characteristics are almost always significant if the original characteristic was also significant.
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.
Global Open-Source Database Software Market Drivers
The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:
Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global open source tools market was valued at USD 35.43 billion in 2021, and it is projected to reach a value of USD 87.77 billion by 2033, exhibiting a CAGR of 8.8% during the forecast period (2025-2033). The market growth is primarily driven by the increasing adoption of open source tools by businesses of all sizes, the rising demand for data analytics and artificial intelligence (AI), the growing need for cost-effective software solutions, and the increasing awareness of the benefits of open source software. In terms of segments, the computer vision segment is expected to grow at the highest CAGR during the forecast period, owing to the increasing use of computer vision technology in various industries, such as manufacturing, healthcare, and transportation. The data visualization segment is also expected to grow at a significant rate, due to the growing demand for data visualization tools to help businesses understand and communicate their data more effectively. The North America region is expected to dominate the market throughout the forecast period, followed by Europe and Asia-Pacific. The presence of a large number of technology companies and the early adoption of open source tools in North America is contributing to the growth of the market in this region.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Discover the booming open-source tools market! This comprehensive analysis reveals key trends, drivers, and restraints impacting growth from 2025-2033, covering applications like machine learning & data science across major regions. Explore market size, CAGR projections, and leading companies shaping the future of open-source technology.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Discover the explosive growth of the open-source big data tools market, projected at a 18% CAGR to reach $55.7 billion by 2033. This in-depth analysis explores key drivers, trends, restraints, and regional market shares, highlighting leading companies and applications. Learn how open-source solutions are revolutionizing data management and analysis.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The open-source big data tools market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and flexible data management and analysis solutions across diverse sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This significant expansion is fueled by several key factors. Firstly, the rising volume and velocity of data generated across industries necessitates sophisticated tools capable of handling massive datasets efficiently. Secondly, the cost-effectiveness of open-source solutions compared to proprietary alternatives is a major attraction for businesses of all sizes, particularly startups and SMEs. Thirdly, the active and collaborative open-source community ensures continuous innovation and improvement in these tools, making them highly adaptable to evolving technological landscapes. The increasing adoption of cloud computing further contributes to market growth, as open-source tools seamlessly integrate with cloud platforms. Growth is segmented across various tools, with data analysis tools experiencing the highest demand due to the growing focus on data-driven decision-making. Key application areas include banking, manufacturing, and government, reflecting the wide applicability of these tools across sectors. While geographical distribution is diverse, North America and Europe currently hold significant market share, though rapid growth is anticipated in the Asia-Pacific region driven by increasing digitalization and adoption of advanced analytics. However, the market faces challenges including the complexity of implementation and maintenance of some open-source tools, requiring specialized expertise, and the need for robust security measures to protect sensitive data. Despite these hurdles, the inherent advantages of cost-effectiveness, flexibility, and community support position the open-source big data tools market for sustained and considerable expansion in the coming years.
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Open Source Services Market size was valued at USD 33.83 Billion in 2024 and is projected to reach USD 172.75 Billion by 2031, growing at a CAGR of 24.94% during the forecast period 2024-2031.
Global Open Source Services Market Drivers
Cost-Effectiveness: Open source software (OSS) often offers lower upfront and ongoing costs compared to proprietary software. Flexibility and Customization: OSS provides greater flexibility for customization and integration with existing systems. Innovation and Community Support: The open-source community fosters rapid innovation and provides a rich ecosystem of support and resources.
Global Open Source Services Market Restraints
Perceived Lack of Support: Some organizations may have concerns about the level of support available for open source solutions compared to proprietary software. Security Concerns: While open source has improved security, there are still concerns about potential vulnerabilities.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.
The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields.
The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.
Facebook
TwitterThe goal of the Open Source Indicators (OSI) Program was to make automated predictions of significant societal events through the continuous and automated analysis of publicly available data such as news media, social media, informational websites, and satellite imagery. Societal events of interest included civil unrest, disease outbreaks, and election results. Geographic areas of interest include countries in Latin America (LA) and the Middle East and North Africa (MENA). The handbook is intended to serve as a reference document for the OSI Program and a companion to the ground truth event data used for test and evaluation. The handbook provides guidance regarding the types of events considered; the submission of automated predictions or “warnings;” the development of ground truth; the test and evaluation of submitted warnings; performance measures; and other programmatic information. IARPA initiated a solicitation for OSI Research Teams in late summer 2011 for one base year and two option years of research. MITRE was selected as the Test and Evaluation (T&E) Team in November 2011. Following a review of proposals, three teams (BBN, HRL, and Virginia Tech (VT)) were selected. The OSI Program officially began in April 2012; manual event encoding and formal T&E ended in March 2015.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
control
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Retail point-of-sale (POS) transactions and operator performance logs from a real store environment. Includes timestamps, product details, quantities, and operator IDs — enabling analysis of sales trends, product performance, and staff efficiency.
Applications • Sales forecasting & trend analysis • Market basket analysis • Employee productivity insights • Business analytics & ML modeling
Source: MDPI Data Journal License: CC BY-NC 4.0 — non-commercial use only.
Cite:
Alves, T.M.F.; de Carvalho, A.C.P.L.F.; Cardoso, J.M.P. (2019). An Open-Source Point of Sale Dataset for the Analysis of Sales Transactions and Operator Efficiency. Data, 4(2), 67.
Facebook
TwitterOpen Source Application Development Portal (OSADP). The system provides a place for programmers to share software code and solutions.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The open-source data labeling tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in various AI applications. The market's expansion is fueled by several key factors: the rising adoption of machine learning and deep learning algorithms across industries, the need for efficient and cost-effective data annotation solutions, and a growing preference for customizable and flexible tools that can adapt to diverse data types and project requirements. While proprietary solutions exist, the open-source ecosystem offers advantages including community support, transparency, cost-effectiveness, and the ability to tailor tools to specific needs, fostering innovation and accessibility. The market is segmented by tool type (image, text, video, audio), deployment model (cloud, on-premise), and industry (automotive, healthcare, finance). We project a market size of approximately $500 million in 2025, with a compound annual growth rate (CAGR) of 25% from 2025 to 2033, reaching approximately $2.7 billion by 2033. This growth is tempered by challenges such as the complexities associated with data security, the need for skilled personnel to manage and use these tools effectively, and the inherent limitations of certain open-source solutions compared to their commercial counterparts. Despite these restraints, the open-source model's inherent flexibility and cost advantages will continue to attract a significant user base. The market's competitive landscape includes established players like Alecion and Appen, alongside numerous smaller companies and open-source communities actively contributing to the development and improvement of these tools. Geographical expansion is expected across North America, Europe, and Asia-Pacific, with the latter projected to witness significant growth due to the increasing adoption of AI and machine learning in developing economies. Future market trends point towards increased integration of automated labeling techniques within open-source tools, enhanced collaborative features to improve efficiency, and further specialization to cater to specific data types and industry-specific requirements. Continuous innovation and community contributions will remain crucial drivers of growth in this dynamic market segment.
Facebook
TwitterThis dataset lists out all software in use by NASA.