100+ datasets found

R
Data Open Source Dataset
universe.roboflow.com
zip
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data OPT Tebu (2025). Data Open Source Dataset [Dataset]. https://universe.roboflow.com/data-opt-tebu/data-open-source
Explore at:
zipAvailable download formats
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Data OPT Tebu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Pest Bounding Boxes
Description
Data Open Source

## Overview Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Linked Open Data Management Services: A Comparison
zenodo.org
data.niaid.nih.gov
+1more
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7738424
Dataset updated
Sep 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

ConedaKOR

LinkedDataHub

Metaphacts

Omeka S

ResearchSpace

Vitro

Wikibase

WissKI

The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

[2] Full paper will be made available open access in the conference proceedings.
Data from: Open Source Cross-Sectional Asset Pricing
catalog.data.gov
s.cnmilf.com
Updated Dec 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve System (2024). Open Source Cross-Sectional Asset Pricing [Dataset]. https://catalog.data.gov/dataset/open-source-cross-sectional-asset-pricing
Explore at:
Dataset updated
Dec 18, 2024
Dataset provided by
Federal Reserve Board of Governors
Federal Reserve Systemhttp://www.federalreserve.gov/
Description
These data and code successfully reproduce nearly all cross-sectional stock return predictors. The 319 characteristics draw from previous meta-studies, but authors differ by comparing their t-stats to the original papers' results. For the 161 characteristics that were clearly significant in the original papers, 98% of their long-short portfolios find t-stats above 1.96. For the 44 characteristics that had mixed evidence, authors' reproductions find t-stats of 2 on average. A regression of reproduced t-stats on original longshort t-stats finds a slope of 0.90 and an R2 of 83%. Mean returns aremonotonic in predictive signals at the characteristic level. The remaining 114 characteristics were insignificant in the original papers or are modifications of the originals created by Hou, Xue, and Zhang (2020). These remaining characteristics are almost always significant if the original characteristic was also significant.
NASA Open Source And General Resource Software API
catalog.data.gov
s.cnmilf.com
+3more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). NASA Open Source And General Resource Software API [Dataset]. https://catalog.data.gov/dataset/nasa-open-source-and-general-resource-software-api
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This dataset lists out all software in use by NASA.
Z
Data from: A Large-scale Dataset of (Open Source) License Text Variants
data.niaid.nih.gov
Updated Mar 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
Explore at:
Dataset updated
Mar 31, 2022
Dataset provided by
LTCI, Télécom Paris, Institut Polytechnique de Paris
Authors
Stefano Zacchiroli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

For more details see the included README file and companion paper:

Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

If you use this dataset for research purposes, please acknowledge its use by citing the above paper.
O
Open Source Big Data Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Open Source Big Data Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-big-data-tools-58978
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Mar 15, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the explosive growth of the open-source big data tools market, projected at a 18% CAGR to reach $55.7 billion by 2033. This in-depth analysis explores key drivers, trends, restraints, and regional market shares, highlighting leading companies and applications. Learn how open-source solutions are revolutionizing data management and analysis.
O
Open Source Data Annotation Tool Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Open Source Data Annotation Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-data-annotation-tool-1464677
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The open-source data annotation tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in the burgeoning fields of artificial intelligence (AI) and machine learning (ML). The market's expansion is fueled by the need for efficient and cost-effective annotation solutions, particularly for large datasets. Organizations across various sectors, including automotive, healthcare, and finance, are leveraging these tools to improve the accuracy and performance of their AI models. The availability of open-source alternatives offers a significant advantage over proprietary solutions, enabling developers and researchers to customize tools according to their specific needs and avoid vendor lock-in. Furthermore, the collaborative nature of open-source projects fosters innovation and continuous improvement, resulting in a more dynamic and rapidly evolving ecosystem. While the market is relatively nascent, it exhibits a substantial growth trajectory, attracting numerous companies and developers, as evidenced by the active participation of organizations such as Alecion, Amazon Mechanical Turk, and Appen Limited. This competitive landscape further accelerates innovation and accessibility. The open-source nature of these tools also democratizes access to advanced AI development capabilities. Smaller companies and individual researchers can now participate in the development and deployment of AI solutions, leveling the playing field and fostering wider adoption. However, the market faces challenges such as the need for ongoing community support and maintenance of these tools, ensuring their long-term viability and preventing fragmentation. Despite these challenges, the future outlook for the open-source data annotation tool market remains positive, with continued growth driven by increased adoption in various industries and advancements in AI and ML technologies. The market is predicted to maintain a healthy compound annual growth rate (CAGR) over the forecast period, reflecting the sustained demand for efficient and accessible data annotation solutions.
h
open-source-data-abuse
huggingface.co
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Tseng (2025). open-source-data-abuse [Dataset]. https://huggingface.co/datasets/agentlans/open-source-data-abuse
Explore at:
Dataset updated
Apr 29, 2025
Authors
Alan Tseng
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
The Dark Side of Openness: How Open Source Data Can Be Abused to Harm Human Life

First draft partially generated using Perplexity AI, then written and edited manually and revised using agentlans/granite-3.3-2b-reviser. Open-source data, a vast resource for innovation and collaboration, offers significant benefits. However, the same openness that empowers progress can also create serious risks. The potential for harm arises when personal and sensitive data is exposed, potentially… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/open-source-data-abuse.
Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21967265.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE
Data from: Tools for Open Source, Subnational CGE Modeling with an...
catalog.data.gov
s.cnmilf.com
Updated Oct 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2021). Tools for Open Source, Subnational CGE Modeling with an Illustrative Analysis of Carbon Leakage Data Set [Dataset]. https://catalog.data.gov/dataset/tools-for-open-source-subnational-cge-modeling-with-an-illustrative-analysis-of-carbon-lea
Explore at:
Dataset updated
Oct 23, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Code and data to reproduce the results and datasets from "Tools for Open Source, Subnational CGE Modeling with an Illustrative Analysis of Carbon Leakage" by Andrew Schreiber and Thomas F. Rutherford, in the Journal of Global Economic Analysis. Citation information for this dataset can be found in the EDG's Metadata Reference Information section and Data.gov's References section.
Data from: PTMTorrent: A Dataset for Mining Open-source Pre-trained Model...
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenxin Jiang; Nicholas Synovic; Purvish Jajal; Taylor R. Schorlemmer; Arav Tewari; Bhavesh Pareek; George K. Thiruvathukal; James C. Davis (2023). PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages [Dataset]. http://doi.org/10.6084/m9.figshare.22009880.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22009880.v4
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wenxin Jiang; Nicholas Synovic; Purvish Jajal; Taylor R. Schorlemmer; Arav Tewari; Bhavesh Pareek; George K. Thiruvathukal; James C. Davis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as “model hubs” support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. Mining the information in PTM packages will enable the discovery of engineering phenomena and tools to support software engineers. However, accessing this information is difficult — there are many PTM registries, and both the registries and the individual packages may have rate limiting for accessing the data.

We present an open-source dataset, PTMTorrent, to facilitate the evaluation and understanding of PTM packages. This paper describes the creation, structure, usage, and limitations of the dataset. The dataset includes a snapshot of 5 model hubs and a total of 15,913 PTM packages. These packages are represented in a uniform data schema for cross-hub mining. We describe prior uses of this data and suggest research opportunities for mining using our dataset.

We provide links to the PTM Dataset and PTM Torrent Source Code.
d
Global Open Source Software Market Data
decipherzone.com
csv
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Decipher Zone (2024). Global Open Source Software Market Data [Dataset]. https://www.decipherzone.com/blog-detail/benefits-of-open-source-software-development
Explore at:
csvAvailable download formats
Dataset updated
Dec 23, 2024
Dataset authored and provided by
Decipher Zone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Market research dataset covering growth of the global open-source software market, including benefits, adoption, and enterprise usage in 2025.
O
Open Source Data Labeling Tool Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Open Source Data Labeling Tool Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-data-labeling-tool-1421234
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 31, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The open-source data labeling tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in various AI applications. The market's expansion is fueled by several key factors: the rising adoption of machine learning and deep learning algorithms across industries, the need for efficient and cost-effective data annotation solutions, and a growing preference for customizable and flexible tools that can adapt to diverse data types and project requirements. While proprietary solutions exist, the open-source ecosystem offers advantages including community support, transparency, cost-effectiveness, and the ability to tailor tools to specific needs, fostering innovation and accessibility. The market is segmented by tool type (image, text, video, audio), deployment model (cloud, on-premise), and industry (automotive, healthcare, finance). We project a market size of approximately $500 million in 2025, with a compound annual growth rate (CAGR) of 25% from 2025 to 2033, reaching approximately $2.7 billion by 2033. This growth is tempered by challenges such as the complexities associated with data security, the need for skilled personnel to manage and use these tools effectively, and the inherent limitations of certain open-source solutions compared to their commercial counterparts. Despite these restraints, the open-source model's inherent flexibility and cost advantages will continue to attract a significant user base. The market's competitive landscape includes established players like Alecion and Appen, alongside numerous smaller companies and open-source communities actively contributing to the development and improvement of these tools. Geographical expansion is expected across North America, Europe, and Asia-Pacific, with the latter projected to witness significant growth due to the increasing adoption of AI and machine learning in developing economies. Future market trends point towards increased integration of automated labeling techniques within open-source tools, enhanced collaborative features to improve efficiency, and further specialization to cater to specific data types and industry-specific requirements. Continuous innovation and community contributions will remain crucial drivers of growth in this dynamic market segment.
O
Open Source Big Data Tools Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Open Source Big Data Tools Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-big-data-tools-58866
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Mar 15, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The open-source big data tools market is experiencing robust growth, driven by the increasing need for scalable, cost-effective, and flexible data management and analysis solutions across diverse sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This significant expansion is fueled by several key factors. Firstly, the rising volume and velocity of data generated across industries necessitates sophisticated tools capable of handling massive datasets efficiently. Secondly, the cost-effectiveness of open-source solutions compared to proprietary alternatives is a major attraction for businesses of all sizes, particularly startups and SMEs. Thirdly, the active and collaborative open-source community ensures continuous innovation and improvement in these tools, making them highly adaptable to evolving technological landscapes. The increasing adoption of cloud computing further contributes to market growth, as open-source tools seamlessly integrate with cloud platforms. Growth is segmented across various tools, with data analysis tools experiencing the highest demand due to the growing focus on data-driven decision-making. Key application areas include banking, manufacturing, and government, reflecting the wide applicability of these tools across sectors. While geographical distribution is diverse, North America and Europe currently hold significant market share, though rapid growth is anticipated in the Asia-Pacific region driven by increasing digitalization and adoption of advanced analytics. However, the market faces challenges including the complexity of implementation and maintenance of some open-source tools, requiring specialized expertise, and the need for robust security measures to protect sensitive data. Despite these hurdles, the inherent advantages of cost-effectiveness, flexibility, and community support position the open-source big data tools market for sustained and considerable expansion in the coming years.
Open-Source Point of Sale Dataset
kaggle.com
zip
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mehdi Akoudadd (2025). Open-Source Point of Sale Dataset [Dataset]. https://www.kaggle.com/datasets/mehdiakoudadd/open-source-point-of-sale-dataset
Explore at:
zip(2637789 bytes)Available download formats
Dataset updated
Aug 9, 2025
Authors
Mehdi Akoudadd
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Retail point-of-sale (POS) transactions and operator performance logs from a real store environment. Includes timestamps, product details, quantities, and operator IDs — enabling analysis of sales trends, product performance, and staff efficiency.

Applications • Sales forecasting & trend analysis • Market basket analysis • Employee productivity insights • Business analytics & ML modeling

Source: MDPI Data Journal License: CC BY-NC 4.0 — non-commercial use only.

Cite:

Alves, T.M.F.; de Carvalho, A.C.P.L.F.; Cardoso, J.M.P. (2019). An Open-Source Point of Sale Dataset for the Analysis of Sales Transactions and Operator Efficiency. Data, 4(2), 67.
Enterprise-Driven Open Source Software
zenodo.org
data.europa.eu
application/gzip
Updated Apr 22, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas (2020). Enterprise-Driven Open Source Software [Dataset]. http://doi.org/10.5281/zenodo.3653878
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3653878
Dataset updated
Apr 22, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

sdtc: true if selected using the same domain top committers heuristic (9,006 records)

mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,289 records)

mcve: true if selected using the multiple committers from a probable company heuristic (7,990 records),

star_number: number of GitHub watchers

commit_count: number of commits

files: number of files in current main branch

lines: corresponding number of lines in text files

pull_requests: number of pull requests

most_recent_commit: date of the most recent commit

committer_count: number of different committers

author_count: number of different authors

dominant_domain: the projects dominant email domain

dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain

dominant_domain_author_commits: corresponding number for commit authors

dominant_domain_committers: number of committers whose email matches the project's dominant domain

dominant_domain_authors: corresponding number of commit authors

cik: SEC's EDGAR "central index key"

fg500: true if this is a Fortune Global 500 company (2,232 records)

sec10k: true if the company files SEC 10-K forms (4,178 records)

sec20f: true if the company files SEC 20-F forms (429 records)

project_name: GitHub project name

owner_login: GitHub project's owner login

company_name: company name as derived from the SEC and Fortune 500 data

owner_company: GitHub project's owner company name

license: SPDX license identifier

The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

stars: number of GitHub watchers

commit_count: number of commits
o
Open Source Software licensing - basics - Dataset - Open Data Hub
datahub.openscience.eu
Updated Nov 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Open Source Software licensing - basics - Dataset - Open Data Hub [Dataset]. https://datahub.openscience.eu/dataset/open-source-software-licensing-basics
Explore at:
Dataset updated
Nov 18, 2023
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The presentation explains in the simplest possible way what you need to know about open source licenses when starting from scratch. It also sums up the course "Open Source Licensing Basics for Software Developers (LFC191)" (Linux Foundation)
O
Open Source Tools Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Open Source Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-tools-1936277
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
May 2, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming open-source tools market! This comprehensive analysis reveals key trends, drivers, and restraints impacting growth from 2025-2033, covering applications like machine learning & data science across major regions. Explore market size, CAGR projections, and leading companies shaping the future of open-source technology.
O
Open Source Big Data Tools Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Open Source Big Data Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-big-data-tools-1949300
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Apr 29, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming open-source big data tools market! This comprehensive analysis reveals key trends, growth drivers, and regional insights for 2025-2033, featuring leading companies like MongoDB and Apache. Learn about market segmentation, application areas, and future projections.
F
Financial Database Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Financial Database Report [Dataset]. https://www.marketreportanalytics.com/reports/financial-database-75305
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Apr 10, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global financial database market is experiencing robust growth, driven by increasing demand for real-time data and advanced analytics across various sectors. The market, estimated at $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 8% from 2025 to 2033, reaching approximately $28 billion by 2033. This expansion is fueled by several key factors: the proliferation of algorithmic trading and quantitative analysis necessitating high-frequency data feeds; the growing adoption of cloud-based solutions enhancing accessibility and scalability; and the increasing regulatory scrutiny demanding robust and reliable financial data for compliance purposes. The market segmentation reveals a strong preference for real-time databases across both personal and commercial applications, reflecting the time-sensitive nature of financial decisions. Key players like Bloomberg, Refinitiv (formerly Thomson Reuters), and FactSet maintain significant market share due to their established brand reputation and comprehensive data offerings. However, the emergence of innovative fintech companies and the increasing availability of open-source data platforms are expected to intensify competition and foster market disruption. The geographical distribution of the market reveals North America as the dominant region, followed by Europe and Asia-Pacific. However, the Asia-Pacific region is poised for significant growth, driven by expanding financial markets in countries like China and India. While the market faces restraints such as data security concerns, increasing data costs, and complexities in data integration, the overall trend points toward sustained expansion. The continuous development of sophisticated analytical tools and the growing need for data-driven decision-making will continue to drive the adoption of financial databases across various user segments and geographies, shaping the competitive landscape in the coming years.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data OPT Tebu (2025). Data Open Source Dataset [Dataset]. https://universe.roboflow.com/data-opt-tebu/data-open-source

Data Open Source Dataset

data-open-source

data-open-source-dataset

Explore at:

zipAvailable download formats

Dataset updated

Apr 17, 2025

Dataset authored and provided by

Data OPT Tebu

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Pest Bounding Boxes

Description

Data Open Source

## Overview

Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Data Open Source Dataset

Data Open Source

Linked Open Data Management Services: A Comparison

Data from: Open Source Cross-Sectional Asset Pricing

NASA Open Source And General Resource Software API

Data from: A Large-scale Dataset of (Open Source) License Text Variants

Open Source Big Data Tools Report

Open Source Data Annotation Tool Report

open-source-data-abuse

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

Data from: Tools for Open Source, Subnational CGE Modeling with an...

Data from: PTMTorrent: A Dataset for Mining Open-source Pre-trained Model...

Global Open Source Software Market Data

Open Source Data Labeling Tool Report

Open Source Big Data Tools Report

Open-Source Point of Sale Dataset

Enterprise-Driven Open Source Software

Open Source Software licensing - basics - Dataset - Open Data Hub

Open Source Tools Report

Open Source Big Data Tools Report

Financial Database Report

Data Open Source DatasetSee More Versions

data-open-source

data-open-source-dataset

Data Open Source

Data Open Source Dataset