82 datasets found
  1. Stack Overflow Developer Survey Dataset

    • kaggle.com
    zip
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palvinder (2024). Stack Overflow Developer Survey Dataset [Dataset]. https://www.kaggle.com/datasets/palvinder2006/stackoverflow
    Explore at:
    zip(9459089 bytes)Available download formats
    Dataset updated
    Jan 8, 2024
    Authors
    Palvinder
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview The Stack Overflow Developer Survey Dataset represents one of the most trusted and comprehensive sources of information about the global developer community. Collected by Stack Overflow through its annual survey, the dataset provides insights into the demographics, preferences, habits, and career paths of developers.

    This dataset is frequently used for: - Analyzing trends in programming languages, tools, and technologies. - Understanding developer job satisfaction, compensation, and work environments. - Studying global and regional differences in developer demographics and experience.

    The data has of two CSV files, "survey_results_public" that consist of data and "survey_results_schema" that describes each column in detail.

    Data Dictionary: All the details are in "survey_results_schema.csv"

    Features of the Stack Overflow Developer Survey Dataset

    Demographic & Background Information - Respondent: A unique identifier for each survey participant. - MainBranch: Describes whether the respondent is a professional developer, student, hobbyist, etc. - Country: The country where the respondent lives. - Age: The respondent's age. - Gender: The gender identity of the respondent. - Ethnicity: Ethnic background (when available). - EdLevel: The highest level of formal education completed. - UndergradMajor: The respondent's undergraduate major. - Hobbyist: Indicates whether the person codes as a hobby (Yes/No).

    Employment & Professional Experience - Employment: Employment status (full-time, part-time, unemployed, student, etc.). - DevType: Types of developer roles the respondent identifies with (e.g., Web Developer, Data Scientist). - YearsCode: Number of years the respondent has been coding. - YearsCodePro: Number of years coding professionally. - JobSat: Job satisfaction level. - CareerSat: Career satisfaction level. - WorkWeekHrs: Approximate hours worked per week. - RemoteWork: Whether the respondent works remotely and how frequently.

    Compensation - CompTotal: Total compensation in USD (including salary, bonuses, etc.). - CompFreq: Frequency of compensation (e.g., yearly, monthly).

    Learning & Education - LearnCode: How the respondent first learned to code (e.g., online courses, university). - LearnCodeOnline: Online resources used (e.g., YouTube, freeCodeCamp). - LearnCodeCoursesCert: Whether the respondent has taken online courses or earned certifications.

    Technology & Tools - LanguageHaveWorkedWith: Programming languages the respondent has used. - LanguageWantToWorkWith: Languages the respondent is interested in learning or using more. - DatabaseHaveWorkedWith: Databases the respondent has experience with. - PlatformHaveWorkedWith: Platforms used (e.g., Linux, AWS, Android). - OpSys: The operating system used most often. - NEWCollabToolsHaveWorkedWith: Collaboration tools used (e.g., Slack, Teams, Zoom). - NEWStuck: How often the respondent feels stuck when coding. - ToolsTechHaveWorkedWith: Frameworks and technologies respondents have worked with.

    Online Presence & Community - SOAccount: Whether the respondent has a Stack Overflow account. - SOPartFreq: How often the respondent participates on Stack Overflow. - SOVisitFreq: Frequency of visiting Stack Overflow. - SOComm: Whether the respondent feels welcome in the Stack Overflow community. - OpenSourcer: Level of involvement in open-source contributions.

    Opinions & Preferences - WorkChallenge: Challenges faced at work (e.g., unclear requirements, unrealistic expectations). - JobFactors: Important job factors (e.g., salary, work-life balance, technologies used). - MentalHealth: Questions on how mental health affects or is affected by their job.

  2. Top Software Companies: Market Cap,Sales & HQ Data

    • kaggle.com
    zip
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asif (2024). Top Software Companies: Market Cap,Sales & HQ Data [Dataset]. https://www.kaggle.com/datasets/muhammadasif786/top-software-companies-market-capsales-and-hq-data
    Explore at:
    zip(1574 bytes)Available download formats
    Dataset updated
    Oct 27, 2024
    Authors
    Muhammad Asif
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    Dive into the dynamic world of the software industry with this comprehensive dataset featuring key metrics from top software companies for the years 2022 to 2023.

    This dataset provides valuable insights into:

    • 1. Organizations: A list of leading software companies shaping the tech landscape. Sales: Annual sales figures, showcasing the revenue generated by each company. -2.**Market Cap**: Important market capitalization data reflecting the companies' financial health and investor confidence. -3.**Headquarters**: Geographical information about where these companies are headquartered, highlighting regional influence. Harness this rich dataset to conduct exploratory data analysis (EDA), visualize trends, and uncover valuable business insights. Whether you're an analyst, researcher, or data enthusiast, this dataset is perfect for understanding the performance and positioning of key players in the software sector.

    Benefits:

    Comprehensive: Data covering essential metrics for informed analysis. Recent: Insights from the latest two years (2022-2023) for current market trends. User-Friendly: Organized structure for easy integration with data manipulation tools like Pandas. Take your data analysis to the next level and explore the competitive landscape of the software industry!

  3. Global sought-after database skills for developers 2021

    • statista.com
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Global sought-after database skills for developers 2021 [Dataset]. https://www.statista.com/statistics/793854/worldwide-developer-survey-most-wanted-database/
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 25, 2021 - Jun 15, 2021
    Area covered
    Worldwide
    Description

    According to the survey, just under 18 percent of respondents identified PostgreSQQL as one of the most-wanted database skills. MongoDB ranked second with 17.89 percent stating they are not developing with it, but want to.

  4. h

    SWE-Dev

    • huggingface.co
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Du (2025). SWE-Dev [Dataset]. https://huggingface.co/datasets/Dorothydu/SWE-Dev
    Explore at:
    Dataset updated
    Sep 21, 2025
    Authors
    Du
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📘 Dataset Card: SWE‑Dev

      📝 Dataset Summary
    

    SWE‑Dev (Software Engineering - Feature-driven Development) is the first large-scale dataset tailored for realistic, feature-driven software development using large language models (LLMs). Each example consists of a natural language product requirement, partial source code, and developer-authored unit tests—designed to simulate real-world software feature implementation tasks within large codebases. The dataset enables LLMs to… See the full description on the dataset page: https://huggingface.co/datasets/Dorothydu/SWE-Dev.

  5. Enterprise-Driven Open Source Software

    • zenodo.org
    • data.europa.eu
    application/gzip
    Updated Apr 22, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas (2020). Enterprise-Driven Open Source Software [Dataset]. http://doi.org/10.5281/zenodo.3653878
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 22, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Diomidis Spinellis; Diomidis Spinellis; Zoe Kotti; Zoe Kotti; Konstantinos Kravvaritis; Konstantinos Kravvaritis; Georgios Theodorou; Georgios Theodorou; Panos Louridas; Panos Louridas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders.

    The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields.

    • url: the project's GitHub URL
    • project_id: the project's GHTorrent identifier
    • sdtc: true if selected using the same domain top committers heuristic (9,006 records)
    • mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,289 records)
    • mcve: true if selected using the multiple committers from a probable company heuristic (7,990 records),
    • star_number: number of GitHub watchers
    • commit_count: number of commits
    • files: number of files in current main branch
    • lines: corresponding number of lines in text files
    • pull_requests: number of pull requests
    • most_recent_commit: date of the most recent commit
    • committer_count: number of different committers
    • author_count: number of different authors
    • dominant_domain: the projects dominant email domain
    • dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain
    • dominant_domain_author_commits: corresponding number for commit authors
    • dominant_domain_committers: number of committers whose email matches the project's dominant domain
    • dominant_domain_authors: corresponding number of commit authors
    • cik: SEC's EDGAR "central index key"
    • fg500: true if this is a Fortune Global 500 company (2,232 records)
    • sec10k: true if the company files SEC 10-K forms (4,178 records)
    • sec20f: true if the company files SEC 20-F forms (429 records)
    • project_name: GitHub project name
    • owner_login: GitHub project's owner login
    • company_name: company name as derived from the SEC and Fortune 500 data
    • owner_company: GitHub project's owner company name
    • license: SPDX license identifier

    The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes.

    • url: the project's GitHub URL
    • project_id: the project's GHTorrent identifier
    • stars: number of GitHub watchers
    • commit_count: number of commits
  6. d

    Data from: Global Fintech Market Dataset

    • decipherzone.com
    csv
    Updated Sep 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Decipher Zone (2025). Global Fintech Market Dataset [Dataset]. https://www.decipherzone.com/blog-detail/fintech-software-development
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 22, 2025
    Dataset authored and provided by
    Decipher Zone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of fintech market growth showing $44.7B funding in H1 2025, projected to reach USD 394.88B in 2025 and USD 1,126.64B by 2032 at a CAGR of 16.2%.

  7. App Developer Data | Engineering Professionals Worldwide Contact Data |...

    • datarade.ai
    Updated Oct 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai (2021). App Developer Data | Engineering Professionals Worldwide Contact Data | Verified Contact Data for Engineers & IT Managers | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/app-developer-data-engineering-professionals-worldwide-cont-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Oct 27, 2021
    Dataset provided by
    Area covered
    Grenada, Tuvalu, Uganda, Norway, Bangladesh, Turkmenistan, Suriname, Poland, Liberia, Burkina Faso
    Description

    Success.ai’s B2B Contact Data and App Developer Data for Engineering Professionals Worldwide is a trusted resource for connecting with engineers and technical managers across industries and regions. This dataset draws from over 170 million verified professional profiles, ensuring you have access to high-quality contact data tailored to your business needs. From sales outreach to recruitment, Success.ai enables you to build meaningful relationships with engineering professionals at every level.

    Why Choose Success.ai’s Engineering Professionals Data?

    1. Accurate and Comprehensive Contact Information:
    2. Access work emails, direct phone numbers, and LinkedIn profiles of engineers and technical managers globally.
    3. Data is AI-validated, ensuring 99% accuracy for your campaigns.

    4. Global Engineering Coverage:

    5. Includes engineers and technical managers from sectors like manufacturing, IT, construction, aerospace, automotive, and more.

    6. Regions covered include North America, Europe, Asia-Pacific, South America, and the Middle East.

    7. Real-Time Updates:

    8. Continuous updates ensure you stay connected to current roles and decision-makers in engineering.

    9. Compliance and Security:

    10. Fully adheres to GDPR, CCPA, and other global data privacy standards, ensuring legal and ethical use.

    Data Highlights: - 170M+ Verified Professional Profiles: Comprehensive data from various industries, including engineering. - 50M Work Emails: Accurate and AI-validated for reliable communication. - 30M Company Profiles: Detailed insights to support targeted outreach. - 700M Global Professional Profiles: A rich dataset designed to meet diverse business needs.

    Key Features of the Dataset: - Extensive Engineer Profiles: Covers various roles, including mechanical, software, civil, and electrical engineers, as well as engineering managers and directors. - Customizable Filters: Segment profiles by location, industry, job title, and company size for precise targeting. - AI-Powered Insights: Enriches profiles with contextual details to support personalization.

    Strategic Use Cases:

    1. Sales and Business Development:
    2. Engage directly with engineering professionals to present tailored solutions.
    3. Reach technical decision-makers to accelerate your sales cycles.

    4. Recruitment and Talent Acquisition:

    5. Source skilled engineers and managers for specialized roles.

    6. Use updated profiles to connect with potential candidates effectively.

    7. Targeted Marketing Campaigns:

    8. Launch precision-driven marketing campaigns aimed at engineers and engineering teams.

    9. Personalize outreach with accurate and detailed contact data.

    10. Engineering Services and Solutions:

    11. Pitch your engineering tools, software, or consulting services to professionals who can benefit the most.

    12. Establish connections with managers who influence procurement decisions.

    Why Success.ai Stands Out:

    1. Best Price Guarantee: Gain access to high-quality datasets at competitive prices.

    2. Flexible Integration Options: Choose between API access or downloadable formats for seamless integration into your systems.

    3. High Accuracy and Coverage: Benefit from AI-validated contact data for impactful results.

    4. Customizable Datasets: Filter and refine datasets to focus on specific engineering roles, industries, or regions.

    APIs for Enhanced Functionality:

    1. Data Enrichment API: Enhance your CRM with verified engineering contact details.
    2. Lead Generation API: Seamlessly integrate new engineering leads into your existing workflow.

    Empower your business with B2B Contact Data for Engineering Professionals Worldwide from Success.ai. With verified work emails, phone numbers, and decision-maker profiles, you can confidently target engineers and managers in any sector.

    Experience the Best Price Guarantee and unlock the potential of precise, AI-validated datasets. Contact us today and start connecting with engineering leaders worldwide!

    No one beats us on price. Period.

  8. d

    WebAutomation Employee Data | Github Developer Profiles | Global 40M+...

    • datarade.ai
    .json, .csv
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Webautomation (2022). WebAutomation Employee Data | Github Developer Profiles | Global 40M+ Developer Records | Explore Developer Repositories, Contributions and more [Dataset]. https://datarade.ai/data-products/webautomation-github-developer-profiles-dataset-global-webautomation
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset authored and provided by
    Webautomation
    Area covered
    Greenland, Estonia, Montserrat, Uruguay, Canada, Guadeloupe, Suriname, Falkland Islands (Malvinas), Ukraine, Paraguay
    Description

    Extensive Developer Coverage: Our employee dataset includes a diverse range of developer profiles from GitHub, spanning various skill levels, industries, and expertise. Access information on developers from all corners of the software development world.

    Developer Profiles: Explore detailed developer profiles, including user bios, locations, company affiliations, and skills. Understand developer backgrounds, experiences, and areas of expertise.

    Repositories and Contributions: Access information about the repositories created by developers and their contributions to open-source projects. Analyze the projects they've worked on, their coding activity, and the impact they've made on the developer community.

    Programming Languages: Gain insights into the programming languages that developers are proficient in. Identify skilled developers in specific programming languages that align with your project needs.

    Customizable Data Delivery: The dataset is available in flexible formats, such as CSV, JSON, or API integration, allowing seamless integration with your existing data infrastructure. Customize the data to meet your specific research and analysis requirements.

  9. Software Market Analysis, Size, and Forecast 2025-2029: North America (US,...

    • technavio.com
    pdf
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Software Market Analysis, Size, and Forecast 2025-2029: North America (US, Canada, and Mexico), Europe (France, Germany, Italy, and UK), Middle East and Africa (UAE), APAC (China, India, and Japan), South America (Brazil), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/software-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2025 - 2029
    Area covered
    Germany, United States, Canada
    Description

    Snapshot img

    Software Market Size 2025-2029

    The software market size is forecast to increase by USD 30.7 billion, at a CAGR of 8.2% between 2024 and 2029.

    The market is experiencing significant growth, driven primarily by the increasing volume of enterprise data and the shift towards cloud computing. Businesses are recognizing the value of leveraging data to gain insights and make informed decisions, leading to a surge in demand for software solutions that can manage and analyze large data sets. Additionally, cloud computing is becoming the preferred deployment model for software, as it offers cost savings, flexibility, and scalability. However, the market also faces challenges that require careful navigation. High costs of licensing and support continue to be a significant obstacle for many organizations, particularly smaller businesses and startups. These costs can limit their ability to implement and maintain the software solutions they need to remain competitive. Furthermore, ensuring data security and privacy in a cloud environment is a major concern, as sensitive information is increasingly being stored and processed digitally. Companies must address these challenges effectively to capitalize on the opportunities presented by the market's growth and remain competitive in the evolving software landscape.

    What will be the Size of the Software Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
    Request Free SampleThe market continues to evolve, with dynamic market activities unfolding across various sectors. Entities such as version control systems, software quality assurance, software licensing, API integration, software maintenance, data warehousing, unit testing, project management, database management, cost optimization, and others, are seamlessly integrated into the software development lifecycle. Cloud computing is transforming the way software is deployed and accessed, while user experience remains a key focus for developers. Agile methodologies and the waterfall methodology coexist, with the former gaining popularity for its flexibility and the latter for its structured approach. Data mining and data analytics are increasingly being used to gain insights from vast amounts of data, while software security and bug tracking are essential components of any development process. Machine learning and artificial intelligence are also making their mark, enhancing software functionality and improving user experience. Proprietary software and open source software each have their unique advantages, with CI/CD and DevOps streamlining the development process. Requirements gathering and user acceptance testing are crucial steps in ensuring software meets user needs, while code review and integration testing help maintain software quality. Technical support and software updates are ongoing requirements, with risk management and cost optimization essential for businesses to effectively manage their software investments. Business intelligence and software architecture are critical for making informed decisions and building scalable systems.

    How is this Software Industry segmented?

    The software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. TypeSubscriptionsIdentity and access managementEndpoint/network/messaging/web securityRisk managementDeploymentCloud-basedOn-premisesSectorLarge enterprisesSmall and medium enterprisesApplicationCRMERPCybersecurityCollaboration ToolsGeographyNorth AmericaUSCanadaMexicoEuropeFranceGermanyItalyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth AmericaBrazilRest of World (ROW)

    By Type Insights

    The subscriptions segment is estimated to witness significant growth during the forecast period.In the ever-evolving the market, subscription-based models are gaining significant traction as a key growth driver. This shift is driven by the increasing recognition of the benefits offered by these models, enabling businesses to adapt to their evolving needs. Subscription models provide flexibility, allowing companies to scale their software usage efficiently, adapting to expanding operations or streamlined processes. Additionally, these models promote cost optimization, enabling businesses to spread their software expenses over time, making it a more viable option for organizations of all sizes. The software development lifecycle is undergoing a transformation, with both waterfall and agile methodologies being adopted. Waterfall methodology, with its linear approach, is ideal for projects with well-defined requirements. In contrast, agile methodologies, with their iterative and collaborative nature, are more suitable for projects with evolving requirements. C

  10. SWE-Bench Coding Tasks Dataset

    • kaggle.com
    zip
    Updated Oct 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unidata (2025). SWE-Bench Coding Tasks Dataset [Dataset]. https://www.kaggle.com/datasets/unidpro/fermatix-swe-bench
    Explore at:
    zip(146556 bytes)Available download formats
    Dataset updated
    Oct 3, 2025
    Authors
    Unidata
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    SWE-Bench Dataset

    The dataset comprises 8,712 files across 6 programming languages, featuring verified tasks and benchmarks for evaluating coding agents and language models. It introduces new benchmarks with real-world coding tasks, providing datasets for software engineering problems and tests. It builds upon the original swe-bench by evaluating repository-level challenges and scoring performances.

    By utilizing this dataset with its multi-language test sets and golden patches, researchers and developers can advance their understanding of large language models and developer tools, comparing their performances on real software engineering challenges. - Get the data

    Specifically engineered for evaluating advanced coding and software development, SWE-Bench Dataset supports research in code generation, automated patching, and fixing GitHub issues.

    💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

    Example of the data

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27063537%2F6876a1091e5e4e12d330177c6ec3a0e6%2F1.PNG?generation=1759494538704549&alt=media" alt="">

    The dataset provides a robust foundation for achieving higher accuracy in code generation and advancing automated software development tools, which are essential for improving developer productivity and software quality.

    🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

  11. Most popular database management systems worldwide 2024

    • statista.com
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    Worldwide
    Description

    As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  12. codereview-dataset

    • huggingface.co
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nutanix (2025). codereview-dataset [Dataset]. https://huggingface.co/datasets/Nutanix/codereview-dataset
    Explore at:
    Dataset updated
    Jun 15, 2025
    Dataset authored and provided by
    Nutanixhttps://nutanix.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Code Review Execution Dataset

    This dataset contains comprehensive code review data including pull requests, AI-generated code suggestions, human feedback, and static analysis results. It represents real-world software development workflows and code quality processes.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset captures the complete lifecycle of code review processes in software development, including:

    Pull request metadata and context… See the full description on the dataset page: https://huggingface.co/datasets/Nutanix/codereview-dataset.

  13. h

    CodeChat

    • huggingface.co
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suzhen Zhong (2023). CodeChat [Dataset]. https://huggingface.co/datasets/Suzhen/CodeChat
    Explore at:
    Dataset updated
    Dec 23, 2023
    Authors
    Suzhen Zhong
    License

    https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/

    Description

    CodeChat: Developer–LLM Conversations Dataset

    Paper: https://arxiv.org/abs/2509.10402
    GitHub: https://github.com/Software-Evolution-Analytics-Lab-SEAL/CodeChat

    CodeChat is a large-scale dataset comprising 82,845 real-world developer–LLM conversations, containing 368,506 code snippets generated across more than 20 programming languages, derived from the WildChat (i.e., general Human-LLMs conversations dataset). The dataset enables empirical analysis of how developers interact… See the full description on the dataset page: https://huggingface.co/datasets/Suzhen/CodeChat.

  14. h

    Data from: VibeCoding

    • huggingface.co
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quixi AI (2025). VibeCoding [Dataset]. https://huggingface.co/datasets/QuixiAI/VibeCoding
    Explore at:
    Dataset updated
    Oct 31, 2025
    Dataset authored and provided by
    Quixi AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🪩 VibeCoding Dataset Project

    Collecting the vibes of coding — one log at a time.

      📢 Call for Volunteers
    

    We’re building an open dataset to capture real-world coding interactions between developers and AI coding assistants — and we need your help! This dataset will help researchers and developers better understand how humans and code models interact across different tools, and improve the future of AI-assisted software development.

      🎯 Project Overview
    

    The… See the full description on the dataset page: https://huggingface.co/datasets/QuixiAI/VibeCoding.

  15. Database management system market size worldwide 2017-2021

    • statista.com
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Database management system market size worldwide 2017-2021 [Dataset]. https://www.statista.com/statistics/724611/worldwide-database-market/
    Explore at:
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global database management system (DBMS) market revenue grew to ** billion U.S. dollars in 2020. Cloud DBMS accounted for the majority of the overall market growth, as database systems are migrating to cloud platforms. Database market The database market consists of paid database software such as Oracle and Microsoft SQL Server, as well as free, open-source software options like PostgreSQL and MongolDB. Database Management Systems (DBMSs) provide a platform through which developers can organize, update, and control large databases, with products like Oracle, MySQL, and Microsoft SQL Server being the most widely used in the market. Database management software Knowledge of the programming languages related to these databases is becoming an increasingly important asset for software developers around the world, and database management skills such as MongoDB and Elasticsearch are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.

  16. d

    Global Open Source Software Market Data

    • decipherzone.com
    csv
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Decipher Zone (2024). Global Open Source Software Market Data [Dataset]. https://www.decipherzone.com/blog-detail/benefits-of-open-source-software-development
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset authored and provided by
    Decipher Zone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Market research dataset covering growth of the global open-source software market, including benefits, adoption, and enterprise usage in 2025.

  17. FOSER - Future of Software Engineering Research

    • datasets.ai
    • data.amerigeoss.org
    • +1more
    33
    Updated Nov 11, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Networking and Information Technology Research and Development, Executive Office of the President (2020). FOSER - Future of Software Engineering Research [Dataset]. https://datasets.ai/datasets/foser-future-of-software-engineering-research
    Explore at:
    33Available download formats
    Dataset updated
    Nov 11, 2020
    Authors
    Networking and Information Technology Research and Development, Executive Office of the President
    Description

    The 2010 Report of the Presidents Council of Advisors on Science and Technology PCAST, entitled ?Designing a Digital Future: Federally Funded Research and Development in Networking and Information Technology,? documents the transformation of our society driven by advances in networking and information technology, catalyzed by our nations past investments in research. Our world today relies to an astonishing degree on systems, tools, and services that belong to a vast and still growing domain known as Networking and Information Technology NIT...

  18. Global Open-Source Database Software Market Size By Product, By Application,...

    • verifiedmarketresearch.com
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Open-Source Database Software Market Size By Product, By Application, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/open-source-database-software-market/
    Explore at:
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Open-Source Database Software Market size was valued at USD 10.00 Billion in 2024 and is projected to reach USD 35.83 Billion by 2032, growing at a CAGR of 20% during the forecast period 2026-2032.

    Global Open-Source Database Software Market Drivers

    The market drivers for the Open-Source Database Software Market can be influenced by various factors. These may include:

    Cost-Effectiveness: Compared to proprietary systems, open-source databases frequently have lower initial expenses, which attracts organizations—especially startups and small to medium-sized enterprises (SMEs) with tight budgets. Flexibility and Customisation: Open-source databases provide more possibilities for customization and flexibility, enabling businesses to modify the database to suit their unique needs and grow as necessary. Collaboration and Community Support: Active developer communities that share best practices, support, and contribute to the continued development of open-source databases are beneficial. This cooperative setting can promote quicker problem solving and innovation. Performance and Scalability: A lot of open-source databases are made to scale horizontally across several nodes, which helps businesses manage expanding data volumes and keep up performance levels as their requirements change. Data Security and Sovereignty: Open-source databases provide businesses more control over their data and allow them to decide where to store and use it, which helps to allay worries about compliance and data sovereignty. Furthermore, open-source code openness can improve security by making it simpler to find and fix problems. Compatibility with Contemporary Technologies: Open-source databases are well-suited for contemporary application development and deployment techniques like microservices, containers, and cloud-native architectures since they frequently support a broad range of programming languages, frameworks, and platforms. Growing Cloud Computing Adoption: Open-source databases offer a flexible and affordable solution for managing data in cloud environments, whether through self-managed deployments or via managed database services provided by cloud providers. This is because more and more organizations are moving their workloads to the cloud. Escalating Need for Real-Time Insights and Analytics: Organizations are increasingly adopting open-source databases with integrated analytics capabilities, like NoSQL and NewSQL databases, as a means of instantly obtaining actionable insights from their data.

  19. Data from: CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java...

    • zenodo.org
    application/gzip, bin
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaihang Jiang; Jin Bihui; Nie Pengyu; Kaihang Jiang; Jin Bihui; Nie Pengyu (2025). CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories [Dataset]. http://doi.org/10.5281/zenodo.15293313
    Explore at:
    bin, application/gzipAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kaihang Jiang; Jin Bihui; Nie Pengyu; Kaihang Jiang; Jin Bihui; Nie Pengyu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Modern programming languages are constantly evolving, introducing new language features and APIs to enhance software development practices. Software developers often face the tedious task of upgrading their codebase to new programming language versions. Recently, large language models (LLMs) have demonstrated potential in automating various code generation and editing tasks, suggesting their applicability in automating code upgrade. However, there exists no benchmark for evaluating the code upgrade ability of LLMs, as distilling code changes related to programming language evolution from real-world software repositories’ commit histories is a complex challenge.
    In this work, we introduce CoUpJava, the first large-scale dataset for code upgrade, focusing on the code changes related to the evolution of Java. CoUpJava comprises 10,697 code upgrade samples, distilled from the commit histories of 1,379 open-source Java repositories and covering Java versions 7–23. The dataset is divided into two subsets: CoUpJava-Fine, which captures fine-grained method-level refactorings towards new language features; and CoUpJava-Coarse, which includes coarse-grained repository-level changes encompassing new language features, standard library APIs, and build configurations. Our proposed dataset provides high-quality samples by filtering irrelevant and noisy changes and verifying the compilability of upgraded code. Moreover, CoUpJava reveals diversity in code upgrade scenarios, ranging from small, fine-grained refactorings to large-scale repository modifications.

  20. h

    Flame-Waterfall-React-Single-Image

    • huggingface.co
    Updated Feb 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Flame-Code-VLM (2025). Flame-Waterfall-React-Single-Image [Dataset]. https://huggingface.co/datasets/Flame-Code-VLM/Flame-Waterfall-React-Single-Image
    Explore at:
    Dataset updated
    Feb 17, 2025
    Authors
    Flame-Code-VLM
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Flame-Waterfall-React: A Structured Data Synthesis Dataset for Multimodal React Code Generation

    Flame-Waterfall-React is a dataset synthesized using the Waterfall-Model-Based Synthesis method, Advancing Vision-Language Models in Front-End Development via Data Synthesis. This dataset is designed to train vision-language models (VLMs) for React code generation from UI design mockups and specifications. The Waterfall synthesis approach mimics real-world software development by… See the full description on the dataset page: https://huggingface.co/datasets/Flame-Code-VLM/Flame-Waterfall-React-Single-Image.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Palvinder (2024). Stack Overflow Developer Survey Dataset [Dataset]. https://www.kaggle.com/datasets/palvinder2006/stackoverflow
Organization logo

Stack Overflow Developer Survey Dataset

Data from world's largest and most trusted community of software developers.

Explore at:
zip(9459089 bytes)Available download formats
Dataset updated
Jan 8, 2024
Authors
Palvinder
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Overview The Stack Overflow Developer Survey Dataset represents one of the most trusted and comprehensive sources of information about the global developer community. Collected by Stack Overflow through its annual survey, the dataset provides insights into the demographics, preferences, habits, and career paths of developers.

This dataset is frequently used for: - Analyzing trends in programming languages, tools, and technologies. - Understanding developer job satisfaction, compensation, and work environments. - Studying global and regional differences in developer demographics and experience.

The data has of two CSV files, "survey_results_public" that consist of data and "survey_results_schema" that describes each column in detail.

Data Dictionary: All the details are in "survey_results_schema.csv"

Features of the Stack Overflow Developer Survey Dataset

Demographic & Background Information - Respondent: A unique identifier for each survey participant. - MainBranch: Describes whether the respondent is a professional developer, student, hobbyist, etc. - Country: The country where the respondent lives. - Age: The respondent's age. - Gender: The gender identity of the respondent. - Ethnicity: Ethnic background (when available). - EdLevel: The highest level of formal education completed. - UndergradMajor: The respondent's undergraduate major. - Hobbyist: Indicates whether the person codes as a hobby (Yes/No).

Employment & Professional Experience - Employment: Employment status (full-time, part-time, unemployed, student, etc.). - DevType: Types of developer roles the respondent identifies with (e.g., Web Developer, Data Scientist). - YearsCode: Number of years the respondent has been coding. - YearsCodePro: Number of years coding professionally. - JobSat: Job satisfaction level. - CareerSat: Career satisfaction level. - WorkWeekHrs: Approximate hours worked per week. - RemoteWork: Whether the respondent works remotely and how frequently.

Compensation - CompTotal: Total compensation in USD (including salary, bonuses, etc.). - CompFreq: Frequency of compensation (e.g., yearly, monthly).

Learning & Education - LearnCode: How the respondent first learned to code (e.g., online courses, university). - LearnCodeOnline: Online resources used (e.g., YouTube, freeCodeCamp). - LearnCodeCoursesCert: Whether the respondent has taken online courses or earned certifications.

Technology & Tools - LanguageHaveWorkedWith: Programming languages the respondent has used. - LanguageWantToWorkWith: Languages the respondent is interested in learning or using more. - DatabaseHaveWorkedWith: Databases the respondent has experience with. - PlatformHaveWorkedWith: Platforms used (e.g., Linux, AWS, Android). - OpSys: The operating system used most often. - NEWCollabToolsHaveWorkedWith: Collaboration tools used (e.g., Slack, Teams, Zoom). - NEWStuck: How often the respondent feels stuck when coding. - ToolsTechHaveWorkedWith: Frameworks and technologies respondents have worked with.

Online Presence & Community - SOAccount: Whether the respondent has a Stack Overflow account. - SOPartFreq: How often the respondent participates on Stack Overflow. - SOVisitFreq: Frequency of visiting Stack Overflow. - SOComm: Whether the respondent feels welcome in the Stack Overflow community. - OpenSourcer: Level of involvement in open-source contributions.

Opinions & Preferences - WorkChallenge: Challenges faced at work (e.g., unclear requirements, unrealistic expectations). - JobFactors: Important job factors (e.g., salary, work-life balance, technologies used). - MentalHealth: Questions on how mental health affects or is affected by their job.

Search
Clear search
Close search
Google apps
Main menu