100+ datasets found
  1. GitHub Programming Languages Data

    • kaggle.com
    zip
    Updated Jan 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
    Explore at:
    zip(41198 bytes)Available download formats
    Dataset updated
    Jan 2, 2022
    Authors
    Isaac Wen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

    One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

    Content

    This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

    Source

    This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

    Limitations

    Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

  2. Python Code Instruction

    • kaggle.com
    zip
    Updated Nov 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Python Code Instruction [Dataset]. https://www.kaggle.com/datasets/thedevastator/python-code-instruction-dataset
    Explore at:
    zip(4069935 bytes)Available download formats
    Dataset updated
    Nov 30, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Python Code Instruction

    Training Data with Instruction, Input, Output, and Prompt Columns

    By Tarun Bisht (From Huggingface) [source]

    About this dataset

    The python_code_instructions_18k_alpaca dataset is a comprehensive training dataset specifically curated for researchers and developers involved in the analysis and comprehension of Python code instructions. It contains a vast collection of Python code snippets along with their corresponding instruction, input, output, and prompt information. By utilizing this dataset, users can gain valuable insights into various Python programming concepts and techniques.

    The dataset is organized into columns to facilitate easy access to the required information. The instruction column holds the specific task or instruction that the Python code snippet is designed to perform. This allows users to understand the purpose or requirement of each code snippet at a glance.

    The input column contains all necessary input data or parameters that are required for executing the Python code snippet accurately. These inputs provide context and enable users to comprehend how different variables or values impact the overall functioning of each code snippet.

    Likewise, the output column presents expected results or outcomes that should be produced when executing each Python code snippet with its specified input values. This allows for validation and verification purposes, ensuring that each code snippet performs as intended.

    In addition to instruction, input, and output details, this dataset also includes prompts. The prompt column provides additional context or information intended to assist users in better understanding the purpose or requirements of each particular Python code snippet.

    By leveraging this comprehensive python_code_instructions_18k_alpaca training dataset, researchers and developers can delve into numerous real-world examples of Python programming challenges - helping them enhance their coding skills while gaining invaluable knowledge about effective implementation techniques across various domains

    Research Ideas

    • Code Instruction Analysis: This dataset can be used to analyze different types of Python code instructions and identify patterns or common practices. Researchers or developers can use this dataset to gain insights into effective ways of writing code instructions.
    • Code Output Prediction: With the given input and instruction, this dataset can be used to train models for predicting the expected output of a Python code snippet. This can be useful in automating the testing process or verifying the correctness of the code.
    • Prompt Generation: Developers often struggle with providing clear and concise prompts for their code snippets. This dataset can serve as a resource for generating prompts by analyzing existing examples and extracting key information or requirements from them

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------| | instruction | Specific tasks or instructions assigned to each Python code snippet. (Text) | | input | The input data or parameters required for executing the code instruction. (Text) | | output | The expected result or output that should be produced when executing the code instruction. (Text) | | prompt | Additional information or context to help understand the purpose or requirements of each code instruction. (Text) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Tarun Bisht (From Huggingface).

  3. Programming languages used for software development worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages used for software development worldwide 2024 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.

  4. Coding Questions with Solutions

    • kaggle.com
    zip
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Coding Questions with Solutions [Dataset]. https://www.kaggle.com/datasets/thedevastator/coding-questions-with-solutions
    Explore at:
    zip(452781832 bytes)Available download formats
    Dataset updated
    Nov 27, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Coding Questions with Solutions

    Introductory, Interview and Competition Levels

    By Huggingface Hub [source]

    About this dataset

    Codeparrot's Apps dataset provides an invaluable tool for coders of all levels to effectively learn and fully understand the programming language of Python. Through a comprehensive collection of programming questions accompanied by detailed solutions, input/output test cases, and related information written in Python, aspiring coders can quickly explore the mysterious depths of coding with confidence. Comprised of natural language questions alongside their respective solutions in Python, this dataset is a perfect starting point for coders looking to unlock the hidden power behind coding - the ability to create something from nothing! Take your first steps today with Codeparrot's Apps dataset and discover how much you can achieve within this powerful language as you continue your journey into programming

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Using this dataset is fairly simple given all its columns are neatly organized in an orderly fashion within one table or CSV file format – unless otherwise stated – so reading through them should be straightforward enough even with minimal coding experience or knowledge on one’s part. All a user needs do is find an appropriate question according to their desired difficulty rating or topic along with correct accompanying information pertains to it - including any relevant starter code provided - then copy & paste those into their local environment while running their test cases supplied against provided input & output values thus verifying if everything works correctly before executing one's own personal modifications or additions in order attempt respond accurately & appropriately at best they can according problem instructions accordingly afterwards sending back response for review/feedback if necessary after completion whenever warranted too appropriately doing so also properly prepare ahead time due additional practicing before appearing within official competitive situations such become quite helpful unexpectedly even unexpected too become thenceforth potentially wholly rewarding unto every learner able put themselves situation whenever likewise opportunity arises successful results inevitably follow then shortly thereafter forget ever worry remaining confused regarding specific matters no more again either whatsoever might prevail occur overnight thus proficiency gained soonest possible manner instead slowly pertaining continuously arduously least preventing further confusion sudden cognitive storms still moreover due accidentally prematurely construed assumptions choosing take broadknowledged approach learning basics saves boatloads sanity time money ultimately goes much further everybody

    Research Ideas

    • As a teaching and learning tool to help beginners get comfortable with programming in Python.
    • As an interview prep tool for experienced coders, since it contains level-specific code examples and test cases for each question.
    • As a competition resource, wherein contestants can try out different solutions and compare them against each other to identify the most efficient one within the given data set

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | question | Natural language question related to coding. (String) | | solutions | Set of solutions written in Python for each given question. (String) | | **inp...

  5. Most popular programming languages worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages worldwide 2024 [Dataset]. https://www.statista.com/statistics/1292294/popular-it-skills-worldwide/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Jun 30, 2024
    Area covered
    Worldwide
    Description

    JavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.

  6. h

    programming-languages-genealogy

    • huggingface.co
    • kaggle.com
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Akiki (2025). programming-languages-genealogy [Dataset]. https://huggingface.co/datasets/christopher/programming-languages-genealogy
    Explore at:
    Dataset updated
    May 6, 2025
    Authors
    Christopher Akiki
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    https://en.wikipedia.org/wiki/Generational_list_of_programming_languages

    This is a "genealogy" of programming languages. Languages are categorized under the ancestor language with the strongest influence. Those ancestor languages are listed in alphabetic order. Any such categorization has a large arbitrary element, since programming languages often incorporate major ideas from multiple sources.

  7. a

    Coursera | Programming For Designers Specialization

    • academictorrents.com
    bittorrent
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2025). Coursera | Programming For Designers Specialization [Dataset]. https://academictorrents.com/details/f58a74aad31ae9ac4c14f3e30414d93fe8c1ce8d
    Explore at:
    bittorrent(6125120437)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Visit >>> Coursera - Programming for Designers Specialization Course details Develop a foundation in Computational Design. Explore Creative Coding with Python What you ll learn - Learn the fundamentals of Python programming, including essential coding techniques - Engage in computational design thinking to approach design problems with a mindset that leverages computational strategy and problem-solving - Understand how to develop custom algorithms that can generate a range of design solutions against complex requirements, constraints, and objectives - Demonstrate the application of computational methods in design-related disciplines using a variety of computational tools Specialization - 3 course series In Programming for Designers, you will explore Python programming within a creative context, equipping you with essential computational design skills. Beginning with fundamental programming principles, y

  8. h

    programming-jokes-dataset

    • huggingface.co
    Updated May 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asfandyar Azhar (2024). programming-jokes-dataset [Dataset]. https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Authors
    Asfandyar Azhar
    Description

    Programming Jokes Dataset

      Dataset Summary
    

    This dataset contains programming-related jokes scraped from the website Punny Funny. The jokes are organized into different categories based on the structure of the original webpage. The dataset is intended for use in natural language processing tasks, such as fine-tuning language models to generate humor or analyze textual content in the programming domain. Number of Jokes: [220]

      Usage
    

    This dataset is suitable for… See the full description on the dataset page: https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset.

  9. o

    EMIP: The eye movements in programming dataset

    • osf.io
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Bednarik (2021). EMIP: The eye movements in programming dataset [Dataset]. https://osf.io/53kts
    Explore at:
    Dataset updated
    Sep 3, 2021
    Dataset provided by
    Center For Open Science
    Authors
    Roman Bednarik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large dataset that contains the eye movements of N=216 programmers of different experience levels captured during two code comprehension tasks is presented. Data are grouped in terms of programming expertise (from none to high) and other demographic descriptors. Data were collected through an international collaborative effort that involved eleven research teams across eight countries on four continents. The same eye tracking apparatus and software was used for the data collection. The Eye Movements in Programming (EMIP) dataset is freely available for download. The varied metadata in the EMIP dataset provides fertile ground for the analysis of gaze behavior and may be used to make novel insights about code comprehension.

    Bednarik, Roman, et al. "EMIP: The eye movements in programming dataset." Science of Computer Programming 198 (2020): 102520.

  10. h

    AI-Coding-Models

    • huggingface.co
    Updated May 24, 2026
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    joy larkin (2026). AI-Coding-Models [Dataset]. https://huggingface.co/datasets/joylarkin/AI-Coding-Models
    Explore at:
    Dataset updated
    May 24, 2026
    Authors
    joy larkin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for 2026 AI Coding Models

    Last Updated: 24 May 2026 Curated By: Joy Larkin Language(s) (NLP): English License: MIT Repository: https://github.com/joylarkin/AI-Coding-Landscape Blog: https://cleverhack.com/ai-coding-landscape

      Dataset Description
    

    CSV file of AI Coding Models released in 2026 & 2025.

  11. World's Most Influential Programming Languages

    • kaggle.com
    zip
    Updated Jan 11, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hammad Farooq (2026). World's Most Influential Programming Languages [Dataset]. https://www.kaggle.com/datasets/hammadfarooq470/worlds-most-influential-programming-languages
    Explore at:
    zip(2160 bytes)Available download formats
    Dataset updated
    Jan 11, 2026
    Authors
    Hammad Farooq
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    Top 100 Programming Languages: Founders & Founding Years

    This dataset contains a curated list of approximately the 100 most influential and widely-used programming languages from 1957 to 2025, ranked roughly by historical impact, current popularity, and overall usage.

    Columns

    ColumnDescriptionType
    LanguageName of the programming languagestring
    Founder/CreatorMain creator(s) or lead designer(s) of the languagestring
    YearApproximate year of first appearance/public releaseinteger

    Key Features

    • Time span: 1957 (Fortran) – 2025 (emerging languages)
    • Includes classics, modern mainstream, systems, web, mobile, data science, blockchain, and experimental languages
    • Founder information is simplified to the most commonly recognized individual or small team
    • Approximate ranking based on historical significance + modern usage trends (as of early 2026)

    Usage Examples

    • Study the evolution of programming languages
    • Visualize timeline of language creation
    • Analyze trends in programming paradigms
    • Educational material for computer science history courses

    Total records: ~100
    Last updated: January 2026

  12. h

    intensive-programming

    • huggingface.co
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NIONGOLO Chrys FĂ©-Marty (2025). intensive-programming [Dataset]. https://huggingface.co/datasets/Svngoku/intensive-programming
    Explore at:
    Dataset updated
    Mar 16, 2025
    Authors
    NIONGOLO Chrys FĂ©-Marty
    Description

    Svngoku/intensive-programming dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. a

    Gitee Code Dataset

    • academictorrents.com
    bittorrent
    Updated Jan 9, 2026
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nyuuzyou (2026). Gitee Code Dataset [Dataset]. https://academictorrents.com/details/e572ddd8459e96ed50ba40f1ee991734805f2259
    Explore at:
    bittorrent(574748274837)Available download formats
    Dataset updated
    Jan 9, 2026
    Dataset authored and provided by
    nyuuzyou
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Gitee Code Dataset ## Dataset Description This dataset was compiled from code repositories hosted on Gitee, China s largest code hosting platform and a leading alternative to GitHub in the Chinese developer community. Gitee is widely used by Chinese developers, enterprises, and open-source projects, making this dataset particularly valuable for training code models with strong Chinese language understanding and Chinese coding conventions. ### Dataset Summary | Statistic | Value | |—————-|———-| | Total Files | 819,472,785 | | Total Repositories | 3,105,923 | | Total Size | 536 GB (compressed Parquet) | | Programming Languages | 554 | | File Format | Parquet with Zstd compression (468 files) | ### Key Features - Large-scale Chinese code corpus: Contains code from over 3 million repositories, many featuring Chinese comments, documentation, and variable names - Diverse language coverage: Span

  14. C

    TIF District Programming - 2020-2024

    • data.cityofchicago.org
    • catalog.data.gov
    • +1more
    csv, xlsx, xml
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2020). TIF District Programming - 2020-2024 [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/TIF-District-Programming-2020-2024/9up3-ycip
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Nov 12, 2020
    Dataset authored and provided by
    City of Chicago
    Description

    This dataset contains amounts shown in the TIF District Programming 2020-2024 reports as well as from 2025 through the planned expiration of each TIF district. The report and corresponding dataset show estimated fund and project balances through the end of FY 2019. Amounts shown in columns 2020 and later reflect known obligations and proposed projects as well as estimates of revenue based on the current equalized assessed value data for each TIF district produced by the Cook County Clerk.

    Versions of this dataset are produced periodically for the then-relevant time periods, which will partially overlap, with newer versions having updated numbers. The immediately previous and next (if available) datasets are shown in the Featured Content cards for this dataset. All versions of this dataset can be found at https://data.cityofchicago.org/browse?tags=tif+district+programming.

  15. G

    AI Coding Pair‑Programming Tutors Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). AI Coding Pair‑Programming Tutors Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-coding-pairprogramming-tutors-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI Coding Pair‑Programming Tutors Market Outlook



    According to our latest research, the global AI Coding Pair-Programming Tutors market size reached USD 1.42 billion in 2024, demonstrating robust momentum driven by the rapid adoption of artificial intelligence in software development and education sectors. The market is projected to grow at a CAGR of 28.6% from 2025 to 2033, reaching an estimated USD 13.93 billion by 2033. This remarkable growth is primarily fueled by the increasing demand for efficient, scalable, and personalized coding education solutions across educational institutions, enterprises, and individual learners worldwide.




    One of the primary growth factors for the AI Coding Pair-Programming Tutors market is the accelerating digital transformation across industries, which has created an unprecedented demand for skilled software developers and programmers. Organizations are seeking innovative ways to upskill their workforce and bridge the talent gap in coding and software engineering. AI-powered pair-programming tutors offer a scalable solution, providing real-time feedback, personalized learning paths, and adaptive problem-solving exercises. These tools leverage advanced natural language processing and machine learning algorithms to simulate collaborative coding experiences, thereby enhancing both the efficiency and effectiveness of learning. As a result, enterprises and educational institutions are increasingly integrating these AI tutors into their training programs to foster continuous learning and improve code quality.




    Another significant driver of market growth is the proliferation of remote and hybrid learning models, especially in the wake of global events that have reshaped educational delivery methods. The flexibility and accessibility offered by AI Coding Pair-Programming Tutors make them ideal for students, professionals, and self-learners who require guidance outside traditional classroom settings. These platforms can assess individual skill levels, recommend targeted exercises, and provide instant feedback, enabling learners to progress at their own pace. The integration of AI tutors with popular learning management systems and coding platforms further enhances their utility, making them a preferred choice for both formal education providers and coding bootcamps. This trend is expected to continue as digital literacy becomes a core requirement across various sectors.




    Technological advancements in artificial intelligence, particularly in natural language understanding and code generation, have significantly improved the capabilities of AI Coding Pair-Programming Tutors. Modern solutions can not only detect syntax and logical errors but also engage in context-aware discussions, suggest alternative approaches, and explain complex programming concepts in simple terms. The rising adoption of cloud-based deployment models has made these tools accessible to a global audience, reducing infrastructure costs and enabling seamless updates. Furthermore, the increasing investment from venture capitalists and technology giants in AI-driven education technology is accelerating innovation, leading to the continuous evolution of more sophisticated and user-friendly AI coding tutors.




    From a regional perspective, North America currently dominates the AI Coding Pair-Programming Tutors market, supported by the presence of leading technology companies, extensive research and development activities, and a strong emphasis on STEM education. Europe and Asia Pacific are also witnessing substantial growth, driven by government initiatives to promote digital skills and the rapid expansion of the IT sector. Emerging markets in Latin America and the Middle East & Africa are gradually adopting AI-based educational technologies, albeit at a slower pace due to infrastructural and economic constraints. However, the growing internet penetration and increasing focus on digital transformation in these regions are expected to unlock new opportunities for market players over the forecast period.





    <h2 id='component-analysis' &

  16. h

    PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Yan, PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker [Dataset]. https://huggingface.co/datasets/JasonYan777/PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker
    Explore at:
    Authors
    Jason Yan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset card for PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker

    This dataset was made with Curator.

      Dataset details
    

    A sample from the dataset: { "dimension_name": "programming_expertise", "dimension_values": [ "Novice", "Intermediate", "Advanced" ], "dimension_description": "Represents the user's practical fluency in software engineering. It shapes how they decompose problems, choose abstractions, weigh… See the full description on the dataset page: https://huggingface.co/datasets/JasonYan777/PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker.

  17. e

    Programming Languages peer-reviewed articles by year

    • exaly.com
    csv, json
    Updated Apr 29, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2026). Programming Languages peer-reviewed articles by year [Dataset]. https://exaly.com/discipline/3035/programming-languages
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Apr 29, 2026
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This chart shows the number of peer-reviewed articles published each year in the discipline of Programming Languages.

  18. r

    Python (programming language) — Brazil

    • rascasse.com
    html, json
    Updated May 11, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rascasse (2026). Python (programming language) — Brazil [Dataset]. https://rascasse.com/explore/br/python-programming-language-18120
    Explore at:
    html, jsonAvailable download formats
    Dataset updated
    May 11, 2026
    Dataset authored and provided by
    Rascasse
    License

    https://rascasse.com/terms/https://rascasse.com/terms/

    Time period covered
    2026
    Area covered
    Brazil
    Variables measured
    Male share, Average age, Female share, Audience size
    Measurement technique
    Search-behavior signal aggregation
    Description

    Python (programming language) audience profile for Brazil.

  19. F

    Data from: On the Transferability of Pre-trained Language Models for...

    • frdr-dfdr.ca
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, Fuxiang (2022). On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages [Dataset]. http://doi.org/10.20383/102.0563
    Explore at:
    Dataset updated
    Mar 23, 2022
    Dataset provided by
    Federated Research Data Repository / dépôt fédéré de données de recherche
    Authors
    Chen, Fuxiang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Pre-trained Language Models (PLM) such as CodeBERT and GraphCodeBERT, when trained on a large corpus of code, have recently displayed promising results in Software Engineering (SE) down-stream tasks. A PLM is most useful if it can be leveraged to improve the performance on code corpora written in low-resource programming languages, where training data is limited. In this work, our focus is on studying the impact of PLMs on a low-resource programming language corpus — specifically, we choose Ruby as the study subject. A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual PLMs achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLM affects different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular SE tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths — here, we bin the Ruby code based on its number of tokens; understanding the performance on different code lengths will enable developers to make more informed decision on the use of PLMs based on their code.

    This dataset, containing the PLMs and their fine-tuned models (there are over a hundred trained and fine-tuned models), was generated by the researchers at the University of British Columbia, Singapore Management University and JetBrains.

  20. Coding Assignment Submissions and Feedback Data for .NET Web Programming

    • figshare.com
    zip
    Updated Sep 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Son Ngo (2025). Coding Assignment Submissions and Feedback Data for .NET Web Programming [Dataset]. http://doi.org/10.6084/m9.figshare.30209056.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 25, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Son Ngo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains programming assignment submissions in .NET Web development from students, along with the corresponding feedback information. The data was collected as part of a study evaluating an automated feedback pipeline that integrates large language models, retrieval-augmented generation, and static code analysis to generate concept-linked feedback for learners.The dataset is organized into several files and subfolders (example structure – adjust according to your actual files):File/FolderDescriptionFormatSize / RecordsNotessubmissions.csvMetadata of student submissions (student ID, assignment ID, timestamp, filename, code reference)CSV~ N recordsOne row per submissionfeedbacks.csvConcept-linked feedback generated by the system (student_id, assignment_id, feedback_text, feedback_type, …)CSV~ M recordsFeedback type: syntax, logic, performance, etc.static_analysis_results.csvResults of static code analysis (errors, warnings, code complexity)CSV / JSON–Includes: submission_id, warning_type, line_number, messageretrieval_matches.csvSnippets or examples retrieved to support feedback generationCSV–Fields: feedback_id, example_snippet_id, similarity_scoreraw_code/Folder with raw source code of submissions.cs / .zip–Named as studentID_assignmentID_version.csmetadata.jsonGeneral metadata about the dataset (field descriptions, collection dates, version)JSON1 fileExplains all fields and schema

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
Organization logo

GitHub Programming Languages Data

Statistics for Programming Languages used on GitHub

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(41198 bytes)Available download formats
Dataset updated
Jan 2, 2022
Authors
Isaac Wen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context

A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

Content

This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

Source

This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

Limitations

Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

Search
Clear search
Close search
Google apps
Main menu