Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.
One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.
This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.
This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.
Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Tarun Bisht (From Huggingface) [source]
The python_code_instructions_18k_alpaca dataset is a comprehensive training dataset specifically curated for researchers and developers involved in the analysis and comprehension of Python code instructions. It contains a vast collection of Python code snippets along with their corresponding instruction, input, output, and prompt information. By utilizing this dataset, users can gain valuable insights into various Python programming concepts and techniques.
The dataset is organized into columns to facilitate easy access to the required information. The instruction column holds the specific task or instruction that the Python code snippet is designed to perform. This allows users to understand the purpose or requirement of each code snippet at a glance.
The input column contains all necessary input data or parameters that are required for executing the Python code snippet accurately. These inputs provide context and enable users to comprehend how different variables or values impact the overall functioning of each code snippet.
Likewise, the output column presents expected results or outcomes that should be produced when executing each Python code snippet with its specified input values. This allows for validation and verification purposes, ensuring that each code snippet performs as intended.
In addition to instruction, input, and output details, this dataset also includes prompts. The prompt column provides additional context or information intended to assist users in better understanding the purpose or requirements of each particular Python code snippet.
By leveraging this comprehensive python_code_instructions_18k_alpaca training dataset, researchers and developers can delve into numerous real-world examples of Python programming challenges - helping them enhance their coding skills while gaining invaluable knowledge about effective implementation techniques across various domains
- Code Instruction Analysis: This dataset can be used to analyze different types of Python code instructions and identify patterns or common practices. Researchers or developers can use this dataset to gain insights into effective ways of writing code instructions.
- Code Output Prediction: With the given input and instruction, this dataset can be used to train models for predicting the expected output of a Python code snippet. This can be useful in automating the testing process or verifying the correctness of the code.
- Prompt Generation: Developers often struggle with providing clear and concise prompts for their code snippets. This dataset can serve as a resource for generating prompts by analyzing existing examples and extracting key information or requirements from them
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------| | instruction | Specific tasks or instructions assigned to each Python code snippet. (Text) | | input | The input data or parameters required for executing the code instruction. (Text) | | output | The expected result or output that should be produced when executing the code instruction. (Text) | | prompt | Additional information or context to help understand the purpose or requirements of each code instruction. (Text) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Tarun Bisht (From Huggingface).
Facebook
TwitterThe most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
Codeparrot's
Appsdataset provides an invaluable tool for coders of all levels to effectively learn and fully understand the programming language of Python. Through a comprehensive collection of programming questions accompanied by detailed solutions, input/output test cases, and related information written in Python, aspiring coders can quickly explore the mysterious depths of coding with confidence. Comprised of natural language questions alongside their respective solutions in Python, this dataset is a perfect starting point for coders looking to unlock the hidden power behind coding - the ability to create something from nothing! Take your first steps today with Codeparrot'sAppsdataset and discover how much you can achieve within this powerful language as you continue your journey into programming
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Using this dataset is fairly simple given all its columns are neatly organized in an orderly fashion within one table or CSV file format – unless otherwise stated – so reading through them should be straightforward enough even with minimal coding experience or knowledge on one’s part. All a user needs do is find an appropriate question according to their desired difficulty rating or topic along with correct accompanying information pertains to it - including any relevant starter code provided - then copy & paste those into their local environment while running their test cases supplied against provided input & output values thus verifying if everything works correctly before executing one's own personal modifications or additions in order attempt respond accurately & appropriately at best they can according problem instructions accordingly afterwards sending back response for review/feedback if necessary after completion whenever warranted too appropriately doing so also properly prepare ahead time due additional practicing before appearing within official competitive situations such become quite helpful unexpectedly even unexpected too become thenceforth potentially wholly rewarding unto every learner able put themselves situation whenever likewise opportunity arises successful results inevitably follow then shortly thereafter forget ever worry remaining confused regarding specific matters no more again either whatsoever might prevail occur overnight thus proficiency gained soonest possible manner instead slowly pertaining continuously arduously least preventing further confusion sudden cognitive storms still moreover due accidentally prematurely construed assumptions choosing take broadknowledged approach learning basics saves boatloads sanity time money ultimately goes much further everybody
- As a teaching and learning tool to help beginners get comfortable with programming in Python.
- As an interview prep tool for experienced coders, since it contains level-specific code examples and test cases for each question.
- As a competition resource, wherein contestants can try out different solutions and compare them against each other to identify the most efficient one within the given data set
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | question | Natural language question related to coding. (String) | | solutions | Set of solutions written in Python for each given question. (String) | | **inp...
Facebook
TwitterJavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
https://en.wikipedia.org/wiki/Generational_list_of_programming_languages
This is a "genealogy" of programming languages. Languages are categorized under the ancestor language with the strongest influence. Those ancestor languages are listed in alphabetic order. Any such categorization has a large arbitrary element, since programming languages often incorporate major ideas from multiple sources.
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Visit >>> Coursera - Programming for Designers Specialization Course details Develop a foundation in Computational Design. Explore Creative Coding with Python What you ll learn - Learn the fundamentals of Python programming, including essential coding techniques - Engage in computational design thinking to approach design problems with a mindset that leverages computational strategy and problem-solving - Understand how to develop custom algorithms that can generate a range of design solutions against complex requirements, constraints, and objectives - Demonstrate the application of computational methods in design-related disciplines using a variety of computational tools Specialization - 3 course series In Programming for Designers, you will explore Python programming within a creative context, equipping you with essential computational design skills. Beginning with fundamental programming principles, y
Facebook
TwitterProgramming Jokes Dataset
Dataset Summary
This dataset contains programming-related jokes scraped from the website Punny Funny. The jokes are organized into different categories based on the structure of the original webpage. The dataset is intended for use in natural language processing tasks, such as fine-tuning language models to generate humor or analyze textual content in the programming domain. Number of Jokes: [220]
Usage
This dataset is suitable for… See the full description on the dataset page: https://huggingface.co/datasets/asfandyarazhar/programming-jokes-dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A large dataset that contains the eye movements of N=216 programmers of different experience levels captured during two code comprehension tasks is presented. Data are grouped in terms of programming expertise (from none to high) and other demographic descriptors. Data were collected through an international collaborative effort that involved eleven research teams across eight countries on four continents. The same eye tracking apparatus and software was used for the data collection. The Eye Movements in Programming (EMIP) dataset is freely available for download. The varied metadata in the EMIP dataset provides fertile ground for the analysis of gaze behavior and may be used to make novel insights about code comprehension.
Bednarik, Roman, et al. "EMIP: The eye movements in programming dataset." Science of Computer Programming 198 (2020): 102520.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for 2026 AI Coding Models
Last Updated: 24 May 2026 Curated By: Joy Larkin Language(s) (NLP): English License: MIT Repository: https://github.com/joylarkin/AI-Coding-Landscape Blog: https://cleverhack.com/ai-coding-landscape
Dataset Description
CSV file of AI Coding Models released in 2026 & 2025.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Top 100 Programming Languages: Founders & Founding Years
This dataset contains a curated list of approximately the 100 most influential and widely-used programming languages from 1957 to 2025, ranked roughly by historical impact, current popularity, and overall usage.
| Column | Description | Type |
|---|---|---|
Language | Name of the programming language | string |
Founder/Creator | Main creator(s) or lead designer(s) of the language | string |
Year | Approximate year of first appearance/public release | integer |
Total records: ~100
Last updated: January 2026
Facebook
TwitterSvngoku/intensive-programming dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Facebook
TwitterThis dataset contains amounts shown in the TIF District Programming 2020-2024 reports as well as from 2025 through the planned expiration of each TIF district. The report and corresponding dataset show estimated fund and project balances through the end of FY 2019. Amounts shown in columns 2020 and later reflect known obligations and proposed projects as well as estimates of revenue based on the current equalized assessed value data for each TIF district produced by the Cook County Clerk.
Versions of this dataset are produced periodically for the then-relevant time periods, which will partially overlap, with newer versions having updated numbers. The immediately previous and next (if available) datasets are shown in the Featured Content cards for this dataset. All versions of this dataset can be found at https://data.cityofchicago.org/browse?tags=tif+district+programming.
Facebook
Twitter
According to our latest research, the global AI Coding Pair-Programming Tutors market size reached USD 1.42 billion in 2024, demonstrating robust momentum driven by the rapid adoption of artificial intelligence in software development and education sectors. The market is projected to grow at a CAGR of 28.6% from 2025 to 2033, reaching an estimated USD 13.93 billion by 2033. This remarkable growth is primarily fueled by the increasing demand for efficient, scalable, and personalized coding education solutions across educational institutions, enterprises, and individual learners worldwide.
One of the primary growth factors for the AI Coding Pair-Programming Tutors market is the accelerating digital transformation across industries, which has created an unprecedented demand for skilled software developers and programmers. Organizations are seeking innovative ways to upskill their workforce and bridge the talent gap in coding and software engineering. AI-powered pair-programming tutors offer a scalable solution, providing real-time feedback, personalized learning paths, and adaptive problem-solving exercises. These tools leverage advanced natural language processing and machine learning algorithms to simulate collaborative coding experiences, thereby enhancing both the efficiency and effectiveness of learning. As a result, enterprises and educational institutions are increasingly integrating these AI tutors into their training programs to foster continuous learning and improve code quality.
Another significant driver of market growth is the proliferation of remote and hybrid learning models, especially in the wake of global events that have reshaped educational delivery methods. The flexibility and accessibility offered by AI Coding Pair-Programming Tutors make them ideal for students, professionals, and self-learners who require guidance outside traditional classroom settings. These platforms can assess individual skill levels, recommend targeted exercises, and provide instant feedback, enabling learners to progress at their own pace. The integration of AI tutors with popular learning management systems and coding platforms further enhances their utility, making them a preferred choice for both formal education providers and coding bootcamps. This trend is expected to continue as digital literacy becomes a core requirement across various sectors.
Technological advancements in artificial intelligence, particularly in natural language understanding and code generation, have significantly improved the capabilities of AI Coding Pair-Programming Tutors. Modern solutions can not only detect syntax and logical errors but also engage in context-aware discussions, suggest alternative approaches, and explain complex programming concepts in simple terms. The rising adoption of cloud-based deployment models has made these tools accessible to a global audience, reducing infrastructure costs and enabling seamless updates. Furthermore, the increasing investment from venture capitalists and technology giants in AI-driven education technology is accelerating innovation, leading to the continuous evolution of more sophisticated and user-friendly AI coding tutors.
From a regional perspective, North America currently dominates the AI Coding Pair-Programming Tutors market, supported by the presence of leading technology companies, extensive research and development activities, and a strong emphasis on STEM education. Europe and Asia Pacific are also witnessing substantial growth, driven by government initiatives to promote digital skills and the rapid expansion of the IT sector. Emerging markets in Latin America and the Middle East & Africa are gradually adopting AI-based educational technologies, albeit at a slower pace due to infrastructural and economic constraints. However, the growing internet penetration and increasing focus on digital transformation in these regions are expected to unlock new opportunities for market players over the forecast period.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset card for PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker
This dataset was made with Curator.
Dataset details
A sample from the dataset: { "dimension_name": "programming_expertise", "dimension_values": [ "Novice", "Intermediate", "Advanced" ], "dimension_description": "Represents the user's practical fluency in software engineering. It shapes how they decompose problems, choose abstractions, weigh… See the full description on the dataset page: https://huggingface.co/datasets/JasonYan777/PersonaSignal-PerceivabilityTest-Programming-Expertise-DPO-Tinker.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This chart shows the number of peer-reviewed articles published each year in the discipline of Programming Languages.
Facebook
Twitterhttps://rascasse.com/terms/https://rascasse.com/terms/
Python (programming language) audience profile for Brazil.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Pre-trained Language Models (PLM) such as CodeBERT and GraphCodeBERT, when trained on a large corpus of code, have recently displayed promising results in Software Engineering (SE) down-stream tasks. A PLM is most useful if it can be leveraged to improve the performance on code corpora written in low-resource programming languages, where training data is limited. In this work, our focus is on studying the impact of PLMs on a low-resource programming language corpus — specifically, we choose Ruby as the study subject. A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual PLMs achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLM affects different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular SE tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths — here, we bin the Ruby code based on its number of tokens; understanding the performance on different code lengths will enable developers to make more informed decision on the use of PLMs based on their code.
This dataset, containing the PLMs and their fine-tuned models (there are over a hundred trained and fine-tuned models), was generated by the researchers at the University of British Columbia, Singapore Management University and JetBrains.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains programming assignment submissions in .NET Web development from students, along with the corresponding feedback information. The data was collected as part of a study evaluating an automated feedback pipeline that integrates large language models, retrieval-augmented generation, and static code analysis to generate concept-linked feedback for learners.The dataset is organized into several files and subfolders (example structure – adjust according to your actual files):File/FolderDescriptionFormatSize / RecordsNotessubmissions.csvMetadata of student submissions (student ID, assignment ID, timestamp, filename, code reference)CSV~ N recordsOne row per submissionfeedbacks.csvConcept-linked feedback generated by the system (student_id, assignment_id, feedback_text, feedback_type, …)CSV~ M recordsFeedback type: syntax, logic, performance, etc.static_analysis_results.csvResults of static code analysis (errors, warnings, code complexity)CSV / JSON–Includes: submission_id, warning_type, line_number, messageretrieval_matches.csvSnippets or examples retrieved to support feedback generationCSV–Fields: feedback_id, example_snippet_id, similarity_scoreraw_code/Folder with raw source code of submissions.cs / .zip–Named as studentID_assignmentID_version.csmetadata.jsonGeneral metadata about the dataset (field descriptions, collection dates, version)JSON1 fileExplains all fields and schema
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.
One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.
This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.
This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.
Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.