Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:
The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.
The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].
[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.
[2] Full paper will be made available open access in the conference proceedings.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.
For more details see the included README file and companion paper:
Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.
If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global open source database market size was valued at approximately USD 15.5 billion in 2023 and is projected to reach around USD 40.6 billion by 2032, expanding at a compound annual growth rate (CAGR) of 11.5% during the forecast period. The growth of this market is primarily driven by the increasing adoption of open-source databases by both SMEs and large enterprises due to their cost-effectiveness and flexibility.
A significant growth factor for the open source database market is the rising demand for data analytics and business intelligence across various industries. Organizations are increasingly leveraging big data to gain actionable insights, enhance decision-making processes, and improve operational efficiency. Open source databases provide the scalability and performance required to handle large volumes of data, making them an attractive option for businesses looking to maximize their data-driven strategies. Additionally, the continuous advancements and contributions from the open-source community help in keeping these databases at the cutting edge of technology.
Another driving factor is the cost-efficiency associated with open-source databases. Unlike proprietary databases, which can be expensive due to licensing fees, open-source databases are usually free to use, offering a significant cost advantage. This factor is especially crucial for small and medium enterprises (SMEs), which often operate with limited budgets. The lower total cost of ownership, combined with the flexibility to customize the database according to specific needs, makes open-source solutions highly appealing for businesses of all sizes.
The increasing trend of digital transformation is also playing a crucial role in the growth of the open source database market. As businesses across various sectors accelerate their digital initiatives, the need for robust, scalable, and efficient data management solutions becomes paramount. Open-source databases provide the agility and innovation that organizations require to keep up with the rapidly changing digital landscape. Moreover, the support for cloud deployment further enhances their appeal, providing businesses with the scalability and flexibility needed to adapt to evolving technological demands.
From a regional perspective, North America holds a significant share in the open source database market, driven by the presence of major technology companies and a highly developed IT infrastructure. The region's focus on technological innovation and early adoption of advanced technologies contributes to its dominant position. Europe follows closely, with increasing investments in digital transformation initiatives. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by rapid technological advancements, a burgeoning IT sector, and increased adoption of open-source solutions by businesses.
Relational Databases Software plays a crucial role in the open-source database market, offering structured data management solutions that are essential for various business applications. These databases are known for their ability to handle complex queries and transactions, making them ideal for industries that require high levels of data integrity and consistency. The flexibility and robustness of relational databases software allow organizations to efficiently manage large volumes of structured data, which is critical for applications such as financial systems, enterprise resource planning, and customer relationship management. As businesses continue to prioritize data-driven decision-making, the demand for relational databases software is expected to grow, further driving the expansion of the open-source database market.
The open source database market is segmented into SQL, NoSQL, and NewSQL databases. SQL databases are the most widely used and have been the backbone of data management for decades. They offer robust transaction management and are ideal for structured data storage and retrieval. The ongoing improvements in SQL databases, such as enhanced performance and security features, continue to make them a preferred choice for many organizations. Additionally, the availability of various SQL-based open-source solutions like MySQL, PostgreSQL, and MariaDB provides organizations with reliable options to manage their data effectively.
NoSQL databases are gainin
This dataset lists out all software in use by NASA.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
The Dark Side of Openness: How Open Source Data Can Be Abused to Harm Human Life
First draft partially generated using Perplexity AI, then written and edited manually. Introduction Open-source data—the vast troves of information freely available to the public—has transformed how we innovate, collaborate, and solve problems. From scientific research to civic technology, the benefits are clear. However, the same openness that drives progress can also create serious risks. When… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/open-source-data-abuse.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The presentation explains in the simplest possible way what you need to know about open source licenses when starting from scratch. It also sums up the course "Open Source Licensing Basics for Software Developers (LFC191)" (Linux Foundation)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.
Global Open-Source Database Software Market Drivers
The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:
Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.
The main dataset is provided as a 17,264 record tab-separated file named enterprise_projects.txt with the following 29 fields.
url: the project's GitHub URL
project_id: the project's GHTorrent identifier
sdtc: true if selected using the same domain top committers heuristic (9,016 records)
mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,314 records)
mcve: true if selected using the multiple committers from a probable company heuristic (8,015 records),
star_number: number of GitHub watchers
commit_count: number of commits
files: number of files in current main branch
lines: corresponding number of lines in text files
pull_requests: number of pull requests
github_repo_creation: timestamp of the GitHub repository creation
earliest_commit: timestamp of the earliest commit
most_recent_commit: date of the most recent commit
committer_count: number of different committers
author_count: number of different authors
dominant_domain: the projects dominant email domain
dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain
dominant_domain_author_commits: corresponding number for commit authors
dominant_domain_committers: number of committers whose email matches the project's dominant domain
dominant_domain_authors: corresponding number for commit authors
cik: SEC's EDGAR "central index key"
fg500: true if this is a Fortune Global 500 company (2,233 records)
sec10k: true if the company files SEC 10-K forms (4,180 records)
sec20f: true if the company files SEC 20-F forms (429 records)
project_name: GitHub project name
owner_login: GitHub project's owner login
company_name: company name as derived from the SEC and Fortune 500 data
owner_company: GitHub project's owner company name
license: SPDX license identifier
The file cohost_project_details.txt provides the full set of 311,223 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.
url: the project's GitHub URL
project_id: the project's GHTorrent identifier
stars: number of GitHub watchers
commit_count: number of commits
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global open source database software market size was valued at approximately USD 11.5 billion in 2023 and is projected to reach an impressive USD 26.8 billion by 2032, growing at a robust CAGR of 9.5% during the forecast period. The exponential growth in this market is attributed to the increasing adoption of cloud-based solutions, surge in enterprise data volume, and the rising demand for cost-effective database management solutions. Organizations across various sectors are increasingly opting for open source database software due to its flexibility, scalability, and ability to handle large volumes of data.
One of the primary growth factors driving the open source database software market is the significant cost savings associated with open source solutions compared to proprietary alternatives. Businesses are continually seeking ways to reduce their IT expenses without compromising on performance and security. Open source database software offers a compelling alternative by eliminating licensing fees and enabling organizations to allocate resources more efficiently. Additionally, the collaborative nature of open source communities fosters continuous improvement and innovation, further enhancing the software's capabilities and reliability.
Another critical growth factor is the accelerating adoption of cloud computing. As more organizations migrate their workloads to the cloud, the demand for cloud-compatible database solutions has surged. Open source database software can be easily integrated with various cloud platforms, providing businesses with the flexibility to scale their operations seamlessly. The cloud-based deployment model also offers benefits such as improved accessibility, reduced infrastructure costs, and enhanced disaster recovery capabilities, making it an attractive option for enterprises of all sizes.
The proliferation of big data and the Internet of Things (IoT) is also contributing significantly to the market's growth. The massive volumes of data generated by IoT devices and other sources require advanced database solutions capable of handling real-time data processing and analytics. Open source database software, with its robust performance and scalability, is well-suited to meet these demands. The ability to customize and extend open source solutions allows organizations to tailor their database infrastructure to specific use cases, further driving adoption across various industries.
Regional outlook for the open source database software market indicates that North America holds the largest market share, driven by the presence of major technology companies and early adoption of advanced IT infrastructure. Europe and Asia Pacific are also significant markets, with the latter expected to witness the highest growth rate during the forecast period. The rapid digitalization of businesses in countries like China and India, coupled with increasing investments in IT infrastructure, is bolstering the market's expansion in the Asia Pacific region.
The emergence of SQL In Memory Database technology is revolutionizing the way organizations handle data-intensive applications. By storing data in the main memory rather than on traditional disk storage, these databases offer significantly faster data retrieval speeds and improved performance. This technology is particularly beneficial for applications requiring real-time analytics and rapid transaction processing, such as financial services, online gaming, and e-commerce. The ability to process large volumes of data with minimal latency is a key advantage, enabling businesses to make quicker and more informed decisions. As the demand for high-performance data solutions grows, SQL In Memory Databases are becoming an integral part of the database landscape, providing the speed and efficiency needed to meet modern business demands.
The open source database software market is segmented into SQL, NoSQL, and NewSQL databases. SQL databases, despite being the oldest form of database management systems, continue to dominate the market due to their robustness, reliability, and widespread adoption. SQL databases are favored for transaction-oriented applications and are commonly used in industries such as banking, finance, and retail. Their ability to handle complex queries, maintain data integrity, and support ACID (Atomicity, Consistency, Isolation, Durability) properties makes them indispensable for criti
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The open-source tools market is experiencing robust growth, driven by increasing demand for cost-effective, flexible, and customizable solutions across diverse sectors. The market, encompassing tools for data cleaning, visualization, mining, and applications like machine learning, natural language processing, and computer vision, is projected to witness substantial expansion over the forecast period (2025-2033). Factors such as the rising adoption of cloud computing, the growing need for data-driven decision-making, and the increasing preference for collaborative development models are key drivers. While the specific CAGR isn't provided, a conservative estimate based on industry trends suggests a compound annual growth rate of around 15-20% is realistic for the period. This growth is anticipated across all segments, with the data science and machine learning sectors exhibiting particularly strong performance. Geographic expansion is also a prominent trend, with North America and Europe leading the market initially, followed by a significant increase in adoption across Asia Pacific and other regions as digital transformation initiatives accelerate. However, challenges remain. Security concerns surrounding open-source software and the need for robust support and maintenance infrastructure could potentially restrain market growth. Nevertheless, ongoing improvements in security protocols and the burgeoning community support surrounding many open-source projects are mitigating these challenges. The diverse range of applications and tool types within the open-source market ensures its versatility. Universal tools, catering to broad needs, and specialized tools like data visualization and mining software are all experiencing increased demand. The presence of established players like IBM and Oracle alongside a large community of contributors ensures a dynamic market ecosystem. The continued development of innovative tools, improved documentation, and enhanced community support are expected to further fuel market growth, making open-source solutions increasingly attractive to businesses of all sizes. Specific segmentation data, while not explicitly provided, shows a spread across applications indicating a healthy, diversified market that is expected to evolve rapidly within the forecast period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic health records (EHRs) are a rich source of information for medical research and public health monitoring. Information systems based on EHR data could also assist in patient care and hospital management. However, much of the data in EHRs is in the form of unstructured text, which is difficult to process for analysis. Natural language processing (NLP), a form of artificial intelligence, has the potential to enable automatic extraction of information from EHRs and several NLP tools adapted to the style of clinical writing have been developed for English and other major languages. In contrast, the development of NLP tools for less widely spoken languages such as Swedish has lagged behind. A major bottleneck in the development of NLP tools is the restricted access to EHRs due to legitimate patient privacy concerns. To overcome this issue we have generated a citizen science platform for collecting artificial Swedish EHRs with the help of Swedish physicians and medical students. These artificial EHRs describe imagined but plausible emergency care patients in a style that closely resembles EHRs used in emergency departments in Sweden. In the pilot phase, we collected a first batch of 50 artificial EHRs, which has passed review by an experienced Swedish emergency care physician. We make this dataset publicly available as OpenChart-SE corpus (version 1) under an open-source license for the NLP research community. The project is now open for general participation and Swedish physicians and medical students are invited to submit EHRs on the project website (https://github.com/Aitslab/openchart-se), where additional batches of quality-controlled EHRs will be released periodically.
Dataset content
OpenChart-SE, version 1 corpus (txt files and and dataset.csv)
The OpenChart-SE corpus, version 1, contains 50 artificial EHRs (note that the numbering starts with 5 as 1-4 were test cases that were not suitable for publication). The EHRs are available in two formats, structured as a .csv file and as separate textfiles for annotation. Note that flaws in the data were not cleaned up so that it simulates what could be encountered when working with data from different EHR systems. All charts have been checked for medical validity by a resident in Emergency Medicine at a Swedish hospital before publication.
Codebook.xlsx
The codebook contain information about each variable used. It is in XLSForm-format, which can be re-used in several different applications for data collection.
suppl_data_1_openchart-se_form.pdf
OpenChart-SE mock emergency care EHR form.
suppl_data_3_openchart-se_dataexploration.ipynb
This jupyter notebook contains the code and results from the analysis of the OpenChart-SE corpus.
More details about the project and information on the upcoming preprint accompanying the dataset can be found on the project website (https://github.com/Aitslab/openchart-se).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
datasets metrics
This dataset contains metrics about the huggingface/datasets package. Number of repositories in the dataset: 4997 Number of packages in the dataset: 215
Package dependents
This contains the data available in the used-by tab on GitHub.
Package & Repository star count
This section shows the package and repository star count, individually.
Package Repository
There are 22 packages that have more than 1000 stars. There are 43… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/datasets-dependents.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset belongs to a publication related to a conference presentation at the International Symposium on Models for Plant Growth, Environments, Farm Management in Orchards and Protected Cultivation - HorchiModel 2023, Almería, Spain. A review of recent literature related to process based greenhouse modelling, based on the work in Katzin, Van Henten, Van Mourik (2022, Agricultural Systems), was presented. A new Web of Science (www.webofscience.com) search was performed, as well as a synthesis of several recently published reviews related to process based greenhouse modelling.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Electronic commerce and technology, use of open source software by North American Industry Classification System (NAICS), for Canada from 2005 to 2007. (Terminated)
A JSON that is used to build the content on code.nasa.gov. This JSON contains names, descriptions, links, and keyword tags for all NASA open-sourced code projects released through the SRA (Software Release Authority) and available on code.nasa.gov. It was updated on August, 2019.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present the data collected as part of the Open-source Complex Ecosystem And Networks (OCEAN) partnership between Google Open Source and the University of Vermont. This includes mailing list emails with standardized format spanning the past three decades from fourteen mailing lists across four different open source communities: Python, Angular, Node.js, and the Go language.This data is presented in the following publication: Warrick, M., Rosenblatt, S. F., Young, J. G., Casari, A., Hébert-Dufresne, L., & Bagrow, J. P. (2022). The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR). IEEE.
https://www.gnu.org/licenses/gpl-3.0-standalone.htmlhttps://www.gnu.org/licenses/gpl-3.0-standalone.html
This record is a global open-source passenger air traffic dataset primarily dedicated to the research community.
It gives a seating capacity available on each origin-destination route for a given year, 2019, and the associated aircraft and airline when this information is available.
Context on the original work is given in the related article (https://journals.open.tudelft.nl/joas/article/download/7201/5683) and on the associated GitHub page (https://github.com/AeroMAPS/AeroSCOPE/).
A simple data exploration interface will be available at www.aeromaps.eu/aeroscope.
The dataset was created by aggregating various available open-source databases with limited geographical coverage. It was then completed using a route database created by parsing Wikipedia and Wikidata, on which the traffic volume was estimated using a machine learning algorithm (XGBoost) trained using traffic and socio-economical data.
The dataset was gathered to allow highly aggregated analyses of the air traffic, at the continental or country levels. At the route level, the accuracy is limited as mentioned in the associated article and improper usage could lead to erroneous analyses.
Each data entry represents an (Origin-Destination-Operator-Aircraft type) tuple.
Please refer to the support article for more details (see above).
The dataset contains the following columns:
Please cite the support paper instead of the dataset itself.
Salgas, A., Sun, J., Delbecq, S., Planès, T., & Lafforgue, G. (2023). Compilation of an open-source traffic and CO2 emissions dataset for commercial aviation. Journal of Open Aviation Science. https://doi.org/10.59490/joas.2023.7201
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset list the dependencies from the repositories contributed by the Public Sector in Luxembourg. The data has been crawled with codegouvfr-fetch-data. If you wish to contribute to this dataset, feel free to contribute the following Github project via issues or pull requests: Open Source Software contributed by the Public sector in Luxembourg, a list of organization accounts
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).