100+ datasets found
  1. Z

    Enterprise-Driven Open Source Software

    • data.niaid.nih.gov
    • opendatalab.com
    Updated Apr 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kotti, Zoe (2020). Enterprise-Driven Open Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653877
    Explore at:
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    Spinellis, Diomidis
    Louridas, Panos
    Theodorou, Georgios
    Kotti, Zoe
    Kravvaritis, Konstantinos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

    The main dataset is provided as a 17,264 record tab-separated file named enterprise_projects.txt with the following 29 fields.

    url: the project's GitHub URL

    project_id: the project's GHTorrent identifier

    sdtc: true if selected using the same domain top committers heuristic (9,016 records)

    mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,314 records)

    mcve: true if selected using the multiple committers from a probable company heuristic (8,015 records),

    star_number: number of GitHub watchers

    commit_count: number of commits

    files: number of files in current main branch

    lines: corresponding number of lines in text files

    pull_requests: number of pull requests

    github_repo_creation: timestamp of the GitHub repository creation

    earliest_commit: timestamp of the earliest commit

    most_recent_commit: date of the most recent commit

    committer_count: number of different committers

    author_count: number of different authors

    dominant_domain: the projects dominant email domain

    dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain

    dominant_domain_author_commits: corresponding number for commit authors

    dominant_domain_committers: number of committers whose email matches the project's dominant domain

    dominant_domain_authors: corresponding number for commit authors

    cik: SEC's EDGAR "central index key"

    fg500: true if this is a Fortune Global 500 company (2,233 records)

    sec10k: true if the company files SEC 10-K forms (4,180 records)

    sec20f: true if the company files SEC 20-F forms (429 records)

    project_name: GitHub project name

    owner_login: GitHub project's owner login

    company_name: company name as derived from the SEC and Fortune 500 data

    owner_company: GitHub project's owner company name

    license: SPDX license identifier

    The file cohost_project_details.txt provides the full set of 311,223 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

    url: the project's GitHub URL

    project_id: the project's GHTorrent identifier

    stars: number of GitHub watchers

    commit_count: number of commits

  2. o

    Open Source Software licensing - basics - Dataset - Open Data Hub

    • datahub.openscience.eu
    Updated Nov 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Open Source Software licensing - basics - Dataset - Open Data Hub [Dataset]. https://datahub.openscience.eu/dataset/open-source-software-licensing-basics
    Explore at:
    Dataset updated
    Nov 18, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The presentation explains in the simplest possible way what you need to know about open source licenses when starting from scratch. It also sums up the course "Open Source Licensing Basics for Software Developers (LFC191)" (Linux Foundation)

  3. Data from: Open Source Software and Organisational Boundaries: Interview...

    • beta.ukdataservice.ac.uk
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UK Data Service (2025). Open Source Software and Organisational Boundaries: Interview Data, 2023 [Dataset]. http://doi.org/10.5255/ukda-sn-857869
    Explore at:
    Dataset updated
    2025
    Dataset provided by
    DataCitehttps://www.datacite.org/
    UK Data Servicehttps://ukdataservice.ac.uk/
    Description

    The development of business products and services underpinned by open source (OS) software and digital infrastructure is widespread. This raises important questions about how that work is resourced and managed within and beyond organisational boundaries. Our study explored how Open Source is located within organisations from both public and commercial sectors and the implications of this for work practices and organisational models. It aimed to provide valuable insights into both the sustainability of OS digital infrastructure and how digital technologies are transforming work in diverse ways. We set out to understand where, why and how organisations from different sectors develop open source software and digital infrastructure as part of their delivery of products and services. And how staff work to develop products and services and maintain OS digital infrastructure within and beyond organisational boundaries.

    Our methods involved qualitative interviews to explore different experiences of open source. Innovative web-scraping, online research and snowball techniques were used to purposively identify organisations that were aware of and using open source. We conducted 20 online interviews with staff from organisations in four broad fields across the public and commercial sectors; global technology corporations, UK public sector (local government), UK Higher education, and Open Source first companies. Interviewees were mostly senior technical staff managing the development of products and services. Key informant interviews were conducted with those in open source community and policy roles. Transcripts were pseudonymised and imported into Nvivo and coded thematically using both inductive and deductive codes.

    Our key findings in the first stage of analysis focused on providing a comparative picture of the 4 groups of organisations, the location of open source, its role in the delivery of products and services and organisational infrastructure, how open source was used and maintained, and where contributions were made to communities. Emerging themes indicated the embedding of open source in the commercial global technology industry with various structures set up internally to support communities and manage licencing and contributions. OS first organisations had put open development at the heart of their mission creating innovative practices and organisational structures to facilitate community support and contribution. In the public sector open source was used in an ad hoc way by universities and local authorities, but increasingly off the shelf-products with support packages had taken precedence as a result of resourcing crises and concerns about risk compatibility and disruption caused by implementation. Further analysis will be looking in detail at structures and models that facilitate or prevent work beyond organisational boundaries and the implications of these new ways of working for the future of work. As such the research contributes to Digit’s goal of understanding how digital technologies are transforming work and the theme of Employers’ and employees’ experiences of digital work across sectors.

    The data collections consists of 17 interview transcripts with workers in four industries UKHE, Global Technology Corporations, UK public sector bodies and Open source first organisations.

  4. G

    Use of open source software, by industry

    • open.canada.ca
    • www150.statcan.gc.ca
    • +1more
    csv, html, xml
    Updated Jan 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2023). Use of open source software, by industry [Dataset]. https://open.canada.ca/data/en/dataset/f5d0611d-080a-4af5-a262-f95713f2203f
    Explore at:
    html, csv, xmlAvailable download formats
    Dataset updated
    Jan 17, 2023
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    Electronic commerce and technology, use of open source software by North American Industry Classification System (NAICS), for Canada from 2005 to 2007. (Terminated)

  5. NASA Open Source And General Resource Software API

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +4more
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2025). NASA Open Source And General Resource Software API [Dataset]. https://catalog.data.gov/dataset/nasa-open-source-and-general-resource-software-api
    Explore at:
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This dataset lists out all software in use by NASA.

  6. h

    open-source-data-abuse

    • huggingface.co
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Tseng (2025). open-source-data-abuse [Dataset]. https://huggingface.co/datasets/agentlans/open-source-data-abuse
    Explore at:
    Dataset updated
    Apr 29, 2025
    Authors
    Alan Tseng
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    The Dark Side of Openness: How Open Source Data Can Be Abused to Harm Human Life

    First draft partially generated using Perplexity AI, then written and edited manually. Introduction Open-source data—the vast troves of information freely available to the public—has transformed how we innovate, collaborate, and solve problems. From scientific research to civic technology, the benefits are clear. However, the same openness that drives progress can also create serious risks. When… See the full description on the dataset page: https://huggingface.co/datasets/agentlans/open-source-data-abuse.

  7. Linked Open Data Management Services: A Comparison

    • zenodo.org
    • data.niaid.nih.gov
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

    • ConedaKOR
    • LinkedDataHub
    • Metaphacts
    • Omeka S
    • ResearchSpace
    • Vitro
    • Wikibase
    • WissKI

    The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

    The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

    [1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

    [2] Full paper will be made available open access in the conference proceedings.

  8. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset authored and provided by
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  9. D

    Open Source Database Market Report | Global Forecast From 2025 To 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Open Source Database Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-open-source-database-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Open Source Database Market Outlook



    The global open source database market size was valued at approximately USD 15.5 billion in 2023 and is projected to reach around USD 40.6 billion by 2032, expanding at a compound annual growth rate (CAGR) of 11.5% during the forecast period. The growth of this market is primarily driven by the increasing adoption of open-source databases by both SMEs and large enterprises due to their cost-effectiveness and flexibility.



    A significant growth factor for the open source database market is the rising demand for data analytics and business intelligence across various industries. Organizations are increasingly leveraging big data to gain actionable insights, enhance decision-making processes, and improve operational efficiency. Open source databases provide the scalability and performance required to handle large volumes of data, making them an attractive option for businesses looking to maximize their data-driven strategies. Additionally, the continuous advancements and contributions from the open-source community help in keeping these databases at the cutting edge of technology.



    Another driving factor is the cost-efficiency associated with open-source databases. Unlike proprietary databases, which can be expensive due to licensing fees, open-source databases are usually free to use, offering a significant cost advantage. This factor is especially crucial for small and medium enterprises (SMEs), which often operate with limited budgets. The lower total cost of ownership, combined with the flexibility to customize the database according to specific needs, makes open-source solutions highly appealing for businesses of all sizes.



    The increasing trend of digital transformation is also playing a crucial role in the growth of the open source database market. As businesses across various sectors accelerate their digital initiatives, the need for robust, scalable, and efficient data management solutions becomes paramount. Open-source databases provide the agility and innovation that organizations require to keep up with the rapidly changing digital landscape. Moreover, the support for cloud deployment further enhances their appeal, providing businesses with the scalability and flexibility needed to adapt to evolving technological demands.



    From a regional perspective, North America holds a significant share in the open source database market, driven by the presence of major technology companies and a highly developed IT infrastructure. The region's focus on technological innovation and early adoption of advanced technologies contributes to its dominant position. Europe follows closely, with increasing investments in digital transformation initiatives. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by rapid technological advancements, a burgeoning IT sector, and increased adoption of open-source solutions by businesses.



    Relational Databases Software plays a crucial role in the open-source database market, offering structured data management solutions that are essential for various business applications. These databases are known for their ability to handle complex queries and transactions, making them ideal for industries that require high levels of data integrity and consistency. The flexibility and robustness of relational databases software allow organizations to efficiently manage large volumes of structured data, which is critical for applications such as financial systems, enterprise resource planning, and customer relationship management. As businesses continue to prioritize data-driven decision-making, the demand for relational databases software is expected to grow, further driving the expansion of the open-source database market.



    Database Type Analysis



    The open source database market is segmented into SQL, NoSQL, and NewSQL databases. SQL databases are the most widely used and have been the backbone of data management for decades. They offer robust transaction management and are ideal for structured data storage and retrieval. The ongoing improvements in SQL databases, such as enhanced performance and security features, continue to make them a preferred choice for many organizations. Additionally, the availability of various SQL-based open-source solutions like MySQL, PostgreSQL, and MariaDB provides organizations with reliable options to manage their data effectively.



    NoSQL databases are gainin

  10. v

    Understanding the Causes of School Violence Using Open Source Data, United...

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • icpsr.umich.edu
    • +1more
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Understanding the Causes of School Violence Using Open Source Data, United States, 1990-2016 [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/understanding-the-causes-of-school-violence-using-open-source-data-united-states-1990-2016-3f99c
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    United States
    Description

    This study provides an evidence-based understanding on etiological issues related to school shootings and rampage shootings. It created a national, open-source database that includes all publicly known shootings that resulted in at least one injury that occurred on K-12 school grounds between 1990 and 2016. The investigators sought to better understand the nature of the problem and clarify the types of shooting incidents occurring in schools, provide information on the characteristics of school shooters, and compare fatal shooting incidents to events where only injuries resulted to identify intervention points that could be exploited to reduce the harm caused by shootings. To accomplish these objectives, the investigators used quantitative multivariate and qualitative case studies research methods to document where and when school violence occurs, and highlight key incident and perpetrator level characteristics to help law enforcement and school administrators differentiate between the kinds of school shootings that exist, to further policy responses that are appropriate for individuals and communities.

  11. NASA open-source code projects with A.I.-generated tags

    • s.cnmilf.com
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +3more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA (2025). NASA open-source code projects with A.I.-generated tags [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/nasa-open-source-code-projects-with-a-i-generated-tags-31117
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    A JSON that is used to build the content on code.nasa.gov. This JSON contains names, descriptions, links, and keyword tags for all NASA open-sourced code projects released through the SRA (Software Release Authority) and available on code.nasa.gov. It was updated on August, 2019.

  12. R

    Data Open Source Dataset

    • universe.roboflow.com
    zip
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data OPT Tebu (2025). Data Open Source Dataset [Dataset]. https://universe.roboflow.com/data-opt-tebu/data-open-source
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Data OPT Tebu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Pest Bounding Boxes
    Description

    Data Open Source

    ## Overview
    
    Data Open Source is a dataset for object detection tasks - it contains Pest annotations for 476 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  13. p

    Open Source Software contributed by the Public Sector in Luxembourg:...

    • data.public.lu
    csv, json
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Open Data Lëtzebuerg (2023). Open Source Software contributed by the Public Sector in Luxembourg: dependencies [Dataset]. https://data.public.lu/en/datasets/open-source-software-contributed-by-the-public-sector-in-luxembourg-dependencies/
    Explore at:
    csv(17125), json(39520)Available download formats
    Dataset updated
    Nov 13, 2023
    Dataset authored and provided by
    Open Data Lëtzebuerg
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Luxembourg
    Description

    This dataset list the dependencies from the repositories contributed by the Public Sector in Luxembourg. The data has been crawled with codegouvfr-fetch-data. If you wish to contribute to this dataset, feel free to contribute the following Github project via issues or pull requests: Open Source Software contributed by the Public sector in Luxembourg, a list of organization accounts

  14. O

    Open Source Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Open Source Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/open-source-tools-1936277
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source tools market is experiencing robust growth, driven by increasing demand for cost-effective, flexible, and customizable solutions across diverse sectors. The market, encompassing tools for data cleaning, visualization, mining, and applications like machine learning, natural language processing, and computer vision, is projected to witness substantial expansion over the forecast period (2025-2033). Factors such as the rising adoption of cloud computing, the growing need for data-driven decision-making, and the increasing preference for collaborative development models are key drivers. While the specific CAGR isn't provided, a conservative estimate based on industry trends suggests a compound annual growth rate of around 15-20% is realistic for the period. This growth is anticipated across all segments, with the data science and machine learning sectors exhibiting particularly strong performance. Geographic expansion is also a prominent trend, with North America and Europe leading the market initially, followed by a significant increase in adoption across Asia Pacific and other regions as digital transformation initiatives accelerate. However, challenges remain. Security concerns surrounding open-source software and the need for robust support and maintenance infrastructure could potentially restrain market growth. Nevertheless, ongoing improvements in security protocols and the burgeoning community support surrounding many open-source projects are mitigating these challenges. The diverse range of applications and tool types within the open-source market ensures its versatility. Universal tools, catering to broad needs, and specialized tools like data visualization and mining software are all experiencing increased demand. The presence of established players like IBM and Oracle alongside a large community of contributors ensures a dynamic market ecosystem. The continued development of innovative tools, improved documentation, and enhanced community support are expected to further fuel market growth, making open-source solutions increasingly attractive to businesses of all sizes. Specific segmentation data, while not explicitly provided, shows a spread across applications indicating a healthy, diversified market that is expected to evolve rapidly within the forecast period.

  15. O

    Open Source Data Labelling Tool Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jul 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Open Source Data Labelling Tool Report [Dataset]. https://www.archivemarketresearch.com/reports/open-source-data-labelling-tool-560375
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jul 27, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source data labeling tool market is experiencing robust growth, driven by the increasing demand for high-quality training data in machine learning and artificial intelligence applications. The market's expansion is fueled by several key factors: the rising adoption of AI across various industries, the need for cost-effective data annotation solutions, and the growing preference for flexible and customizable tools. While precise market sizing data is unavailable, considering the substantial growth in the broader data annotation market and the increasing popularity of open-source solutions, we can reasonably estimate the 2025 market size to be approximately $500 million. This signifies a significant opportunity for providers of open-source tools, particularly those offering innovative features and strong community support. Assuming a conservative Compound Annual Growth Rate (CAGR) of 25% for the forecast period (2025-2033), the market is projected to reach approximately $4.8 billion by 2033. This growth trajectory is supported by the continuous advancements in AI and the ever-increasing volume of data requiring labeling. Several challenges restrain market growth, including the need for specialized technical expertise to effectively implement and manage open-source tools, and the potential for inconsistencies in data quality compared to commercial solutions. However, the inherent advantages of open-source tools—cost-effectiveness, customization, and community-driven improvements—are expected to outweigh these challenges. The increasing availability of user-friendly interfaces and pre-trained models is further enhancing the accessibility and appeal of open-source solutions. The market segmentation encompasses various tool types based on functionality and applications (image annotation, text annotation, video annotation etc.), deployment models (cloud-based, on-premise), and target industries (healthcare, automotive, finance etc.). Leading players are continuously enhancing their offerings, fostering community engagement, and expanding their service portfolios to capitalize on this expanding market.

  16. Data from: A large-scale comparative analysis of Coding Standard conformance...

    • figshare.com
    application/x-gzip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa (2021). A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects [Dataset]. http://doi.org/10.6084/m9.figshare.12377237.v3
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Anj Simmons; Scott Barnett; Jessica Rivera-Villicana; Akshat Bajaj; Rajesh Vasa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study investigates the extent to which data science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity.results.tar.gz: Extracted data for each project, including raw logs of all detected code violations.notebooks_out.tar.gz: Tables and figures generated by notebooks.source_code_anonymized.tar.gz: Anonymized source code (at time of publication) to identify, clone, and analyse the projects. Also includes Jupyter notebooks used to produce figures in the paper.The latest source code can be found at: https://github.com/a2i2/mining-data-science-repositoriesPublished in ESEM 2020: https://doi.org/10.1145/3382494.3410680Preprint: https://arxiv.org/abs/2007.08978

  17. datasets-dependents

    • huggingface.co
    Updated Mar 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugging Face OSS Metrics (2023). datasets-dependents [Dataset]. https://huggingface.co/datasets/open-source-metrics/datasets-dependents
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 2, 2023
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    Hugging Face OSS Metrics
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    datasets metrics

    This dataset contains metrics about the huggingface/datasets package. Number of repositories in the dataset: 4997 Number of packages in the dataset: 215

      Package dependents
    

    This contains the data available in the used-by tab on GitHub.

      Package & Repository star count
    

    This section shows the package and repository star count, individually.

    Package Repository

    There are 22 packages that have more than 1000 stars. There are 43… See the full description on the dataset page: https://huggingface.co/datasets/open-source-metrics/datasets-dependents.

  18. O

    Open Source Data Acquisition Instrument Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Jul 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Open Source Data Acquisition Instrument Report [Dataset]. https://www.marketreportanalytics.com/reports/open-source-data-acquisition-instrument-336435
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jul 13, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The open-source data acquisition (DAQ) instrument market, currently valued at $545 million in 2025, is projected to experience robust growth, fueled by a Compound Annual Growth Rate (CAGR) of 5.5% from 2025 to 2033. This growth is driven by several key factors. The increasing demand for customizable and cost-effective data acquisition solutions across diverse sectors like research, education, and industrial automation is a significant driver. Open-source DAQ instruments offer flexibility and community support, allowing users to adapt them to specific needs and integrate them seamlessly into existing workflows. Furthermore, the rising adoption of Internet of Things (IoT) devices and the need for real-time data processing are contributing to market expansion. The availability of readily accessible software libraries and extensive online resources further enhances the accessibility and appeal of these instruments, making them attractive alternatives to expensive proprietary solutions. Companies like OpenBCI, Red Pitaya, LabJack, Arduino, National Instruments, and ADLINK Technology are key players shaping the market landscape, each contributing unique features and functionalities to this dynamic sector. The market segmentation is likely diverse, with variations based on hardware capabilities (e.g., sampling rate, number of channels, input types), software interfaces (e.g., Python, MATLAB, LabVIEW), and application-specific configurations (e.g., biosignal processing, environmental monitoring). Geographic distribution will also play a crucial role; we anticipate stronger growth in regions with burgeoning technological advancements and a high concentration of research institutions and industrial automation sectors. Restraints on market growth might include the need for users to possess a reasonable level of technical expertise for setup and configuration, and the potential for variations in device quality among different open-source manufacturers. Nonetheless, the overall trend points toward sustained and significant growth for the open-source DAQ instrument market over the next decade.

  19. D

    Open Source Application Development Portal

    • data.transportation.gov
    • data.virginia.gov
    • +5more
    application/rdfxml +5
    Updated Dec 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Open Source Application Development Portal [Dataset]. https://data.transportation.gov/Roadways-and-Bridges/Open-Source-Application-Development-Portal/gpyv-jjdk
    Explore at:
    csv, tsv, application/rssxml, json, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Dec 18, 2018
    Description

    Open Source Application Development Portal (OSADP). The system provides a place for programmers to share software code and solutions.

  20. Data from: Open Source Cross-Sectional Asset Pricing

    • catalog.data.gov
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Open Source Cross-Sectional Asset Pricing [Dataset]. https://catalog.data.gov/dataset/open-source-cross-sectional-asset-pricing
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    These data and code successfully reproduce nearly all cross-sectional stock return predictors. The 319 characteristics draw from previous meta-studies, but authors differ by comparing their t-stats to the original papers' results. For the 161 characteristics that were clearly significant in the original papers, 98% of their long-short portfolios find t-stats above 1.96. For the 44 characteristics that had mixed evidence, authors' reproductions find t-stats of 2 on average. A regression of reproduced t-stats on original longshort t-stats finds a slope of 0.90 and an R2 of 83%. Mean returns aremonotonic in predictive signals at the characteristic level. The remaining 114 characteristics were insignificant in the original papers or are modifications of the originals created by Hou, Xue, and Zhang (2020). These remaining characteristics are almost always significant if the original characteristic was also significant.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kotti, Zoe (2020). Enterprise-Driven Open Source Software [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3653877

Enterprise-Driven Open Source Software

Explore at:
Dataset updated
Apr 22, 2020
Dataset provided by
Spinellis, Diomidis
Louridas, Panos
Theodorou, Georgios
Kotti, Zoe
Kravvaritis, Konstantinos
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,264 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

The main dataset is provided as a 17,264 record tab-separated file named enterprise_projects.txt with the following 29 fields.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

sdtc: true if selected using the same domain top committers heuristic (9,016 records)

mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,314 records)

mcve: true if selected using the multiple committers from a probable company heuristic (8,015 records),

star_number: number of GitHub watchers

commit_count: number of commits

files: number of files in current main branch

lines: corresponding number of lines in text files

pull_requests: number of pull requests

github_repo_creation: timestamp of the GitHub repository creation

earliest_commit: timestamp of the earliest commit

most_recent_commit: date of the most recent commit

committer_count: number of different committers

author_count: number of different authors

dominant_domain: the projects dominant email domain

dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain

dominant_domain_author_commits: corresponding number for commit authors

dominant_domain_committers: number of committers whose email matches the project's dominant domain

dominant_domain_authors: corresponding number for commit authors

cik: SEC's EDGAR "central index key"

fg500: true if this is a Fortune Global 500 company (2,233 records)

sec10k: true if the company files SEC 10-K forms (4,180 records)

sec20f: true if the company files SEC 20-F forms (429 records)

project_name: GitHub project name

owner_login: GitHub project's owner login

company_name: company name as derived from the SEC and Fortune 500 data

owner_company: GitHub project's owner company name

license: SPDX license identifier

The file cohost_project_details.txt provides the full set of 311,223 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

url: the project's GitHub URL

project_id: the project's GHTorrent identifier

stars: number of GitHub watchers

commit_count: number of commits

Search
Clear search
Close search
Google apps
Main menu