100+ datasets found
  1. P

    CodeContests Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Dec 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). CodeContests Dataset [Dataset]. https://paperswithcode.com/dataset/codecontests
    Explore at:
    Dataset updated
    Dec 25, 2023
    Description

    CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.

    It consists of programming problems, from a variety of sources.

    Problems include test cases in the form of paired inputs and outputs, as well as both correct and incorrect human solutions in a variety of languages.

  2. Most widely utilized programming languages among developers worldwide 2023

    • statista.com
    • stelinmart.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Most widely utilized programming languages among developers worldwide 2023 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 8, 2023 - May 19, 2023
    Area covered
    Worldwide
    Description

    As of 2022, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 63.6 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world.

    Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  3. u

    Python Programming Dataset

    • pub.uni-bielefeld.de
    • commons.datacite.org
    Updated Feb 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Paaßen (2020). Python Programming Dataset [Dataset]. https://pub.uni-bielefeld.de/record/2941052
    Explore at:
    Dataset updated
    Feb 10, 2020
    Authors
    Benjamin Paaßen
    Description

    This repository contains programming data collected from 15 students during November and December of 2019 at Bielefeld University. Students were asked to implement gradient descent. Note that this data set contains only source code snapshots and neither timestamps nor personal information. All students programmed in a web environment, which is also contained in this repository.

  4. P

    APPS Dataset

    • paperswithcode.com
    Updated Mar 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Hendrycks; Steven Basart; Saurav Kadavath; Mantas Mazeika; Akul Arora; Ethan Guo; Collin Burns; Samir Puranik; Horace He; Dawn Song; Jacob Steinhardt (2023). APPS Dataset [Dataset]. https://paperswithcode.com/dataset/apps
    Explore at:
    Dataset updated
    Mar 30, 2023
    Authors
    Dan Hendrycks; Steven Basart; Saurav Kadavath; Mantas Mazeika; Akul Arora; Ethan Guo; Collin Burns; Samir Puranik; Horace He; Dawn Song; Jacob Steinhardt
    Description

    The APPS dataset consists of problems collected from different open-access coding websites such as Codeforces, Kattis, and more. The APPS benchmark attempts to mirror how humans programmers are evaluated by posing coding problems in unrestricted natural language and evaluating the correctness of solutions. The problems range in difficulty from introductory to collegiate competition level and measure coding ability as well as problem-solving.

    The Automated Programming Progress Standard, abbreviated APPS, consists of 10,000 coding problems in total, with 131,836 test cases for checking solutions and 232,444 ground-truth solutions written by humans. Problems can be complicated, as the average length of a problem is 293.2 words. The data are split evenly into training and test sets, with 5,000 problems each. In the test set, every problem has multiple test cases, and the average number of test cases is 21.2. Each test case is specifically designed for the corresponding problem, enabling us to rigorously evaluate program functionality.

  5. code_contests

    • huggingface.co
    Updated Sep 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2022). code_contests [Dataset]. https://huggingface.co/datasets/deepmind/code_contests
    Explore at:
    Dataset updated
    Sep 17, 2022
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for CodeContests

      Dataset Summary
    

    CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode. It consists of programming problems, from a variety of sources:

    Site URL Source

    Aizu https://judge.u-aizu.ac.jp CodeNet

    AtCoder https://atcoder.jp CodeNet

    CodeChef https://www.codechef.com description2code

    Codeforces https://codeforces.com description2code and Codeforces

    HackerEarth… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/code_contests.

  6. f

    Programming Languages

    • figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul A. Gagniuc (2023). Programming Languages [Dataset]. http://doi.org/10.6084/m9.figshare.22579246.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Paul A. Gagniuc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files accompany the book entitled: An Introduction to Programming Languages: Simultaneous Learning in Multiple Coding Environments. This work is an introductory textbook in several computer languages. It describes the most well-known and popular programming environments such as: C#, C++, Java, JavaScript, PERL, PHP, Python, Ruby, and Visual Basic (VB) or Visual Basic for Applications (VBA). Therefore, the main objective of this unique guide is to provide code examples reflected in these nine computer languages. Readers can easily understand the connection and universality between the syntax of different environments and be adept at translating code. This learning experience can be ideal for upper-undergraduate introductory courses, researchers, doctoral students, and sociologists or engineers charged with implementing data analysis. Graphical illustrations are used for technical details about the computation examples to aid in an in-depth understanding of their inner workings. Moreover, the book contains original material that has been class-tested by the author and numerous cases are examined. Readers will also benefit from the inclusion of: a) Historical and philosophical perspectives on the past, present and future of computer languages. b) A total of 448 additional files freely available online, from which a total of 44 files are poster presentations (i.e. PowerPoint and PDF files). c) A total of 404 code examples reflected in nine computer languages, namely: C#, C++, Java, JavaScript, PERL, PHP, Python, Ruby and VB. This work first begins with a general introduction to history and presents the natural inevitable pathway from mechanical automatons to present electronic computers. Following this historical introduction, an in-detail look is made on philosophical questions, implementations, entropy and life. More often than not, there is a genuine amazement of the younger generations regarding the advancement of computer technology. Historical events that led to the development of technologies have been distilled down to the essence. However, the essence of any story is made with massive loss of detailed information. The essence of essences even more so. Over time, the lack of detail leads to a collective amnesia that can prevent us from understanding the naturalness by which technology has evolved. Thus, new constructs are always built upon older constructs to fit the evolutionary chain of technological progress, which boils down to the same fundamental rules as biological evolution. In the first stage, this book discusses the natural path of programming constructs by starting from time immemorial and ending with examples up to the present times. In the end, naturally driven constructs of all kinds also drive our society today. In the second part, the emphasis is made on the technical side where a total of nine computer languages are used simultaneously for mirrored examples. Simultaneous learning of multiple computer languages can be regarded as an asset in the world of science and technology. Thus, the reader can get used to the majority of known programming or scripting languages. Moreover, a basic knowledge of software implementation in several computer languages, even in an introductory way, helps the versatility and adaptability of the reader to new situations that may arise in industry, education, or research. Thus, this work is meant to bring a more concrete understanding of the similarities and differences between computer languages.

    Paul A. Gagniuc. An Introduction to Programming Languages: Simultaneous Learning in Multiple Coding Environments. Synthesis Lectures on Computer Science. Springer International Publishing, 2023, pp. 1-280.

  7. F

    Data from: On the Transferability of Pre-trained Language Models for...

    • frdr-dfdr.ca
    Updated Mar 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen, Fuxiang (2022). On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages [Dataset]. http://doi.org/10.20383/102.0563
    Explore at:
    Dataset updated
    Mar 23, 2022
    Dataset provided by
    Federated Research Data Repository / dépôt fédéré de données de recherche
    Authors
    Chen, Fuxiang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Pre-trained Language Models (PLM) such as CodeBERT and GraphCodeBERT, when trained on a large corpus of code, have recently displayed promising results in Software Engineering (SE) down-stream tasks. A PLM is most useful if it can be leveraged to improve the performance on code corpora written in low-resource programming languages, where training data is limited. In this work, our focus is on studying the impact of PLMs on a low-resource programming language corpus — specifically, we choose Ruby as the study subject. A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual PLMs achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLM affects different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular SE tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths — here, we bin the Ruby code based on its number of tokens; understanding the performance on different code lengths will enable developers to make more informed decision on the use of PLMs based on their code.

    This dataset, containing the PLMs and their fine-tuned models (there are over a hundred trained and fine-tuned models), was generated by the researchers at the University of British Columbia, Singapore Management University and JetBrains.

  8. E

    Most Popular Programming Languages Statistics

    • enterpriseappstoday.com
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Most Popular Programming Languages Statistics [Dataset]. https://www.enterpriseappstoday.com/stats/programming-languages-statistics.html
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    programming languages statistics: The tech market which is also booming along with digital marketing is pretty good for a better income source. The tech market has many other things including programming languages. Programming languages are the basis for the formation of various websites, games, software, mobile applications, etc... There are nearly 9,000 programming languages around the world with each language with its own feature. In this most popular programming language statistics, we will have a look at statistical information and general knowledge about worldwide available various programming languages.

    Programming Languages Statistics (Editor’s Choice)

    • There are 8,945 programming languages as stated by most popular Programming languages statistics.
    • As of 2022, JavaScript is one of the most popular programming languages as around 47.86% of recruiters are demanding JavaScript language skills.
    • A basic python developer earns between $70,000 to $1,00,00 a year.
    • As per the most popular programming languages statistics Python has ranked number 1 in the United States of America, India, Germany, France, and the United Kingdom
  9. Replication Kit: "Skill Models for Programming Language Concepts"

    • zenodo.org
    zip
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ella Albrecht; Ella Albrecht; Jens Grabowski; Jens Grabowski (2020). Replication Kit: "Skill Models for Programming Language Concepts" [Dataset]. http://doi.org/10.5281/zenodo.2224248
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ella Albrecht; Ella Albrecht; Jens Grabowski; Jens Grabowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Structure

    • data: contains the data we used for our case study
      • pfa: data sets generated from the raw data in the database
      • raw: raw data collected in SmartAPE [1] containing the source code of the students as well as the assessment results of the system
      • similarity: calculated similarities between solutions for each level and each exercise
    • results: contains the complete results of our case study
      • krms: Knowlede Requirements Models for each exercise and each KC level in .graphml format. You can use yEd [2] to visualize them.
      • pfa_metrics: Results of AUC, Gmean and MCC for each of our PFA model configurations in .csv and .Rda format
      • similarities: plotly [3] graphics of our similarity results in .html format
    • calculation scripts:
      • similarities.R: script used to generate box plots of similarities. Uses data from data/similarities as input
      • pfa_trainer.R: script to fit different configurations of PFA models and test them using different performance metrics. Uses data/pfa as input
      • comparison.R: script that performs statistical tests to compate different PFA configurations. Uses results/pfa_metrics/results.Rda as input

    References

    [1] Albrecht, Ella et al. “Experiences in Introducing Blended Learning in an Introductory Programming Course.” ECSEE (2018).

    [2] yEd - Graph editor. https://www.yworks.com/products/yed

    [3] plotly. https://plot.ly

  10. k

    Programming-Language-Database

    • kaggle.com
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Programming-Language-Database [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/programming-language-database
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2023
    Description

    The dataset contains information on over 4000 programming languages. Which include facts about the language such as what year it was created, What is its rank, and other parameters that you will come to know once you explore the dataset.

    Credits. https://github.com/breck7/pldb

  11. P

    Python Programming Puzzles (P3) Dataset

    • paperswithcode.com
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tal Schuster; Ashwin Kalyan; Oleksandr Polozov; Adam Tauman Kalai (2021). Python Programming Puzzles (P3) Dataset [Dataset]. https://paperswithcode.com/dataset/python-programming-puzzles-p3
    Explore at:
    Dataset updated
    Jun 9, 2021
    Authors
    Tal Schuster; Ashwin Kalyan; Oleksandr Polozov; Adam Tauman Kalai
    Description

    Python Programming Puzzles (P3) is an open-source dataset where each puzzle is defined by a short Python program , and the goal is to find an input which makes output "True". The puzzles are objective in that each one is specified entirely by the source code of its verifier, so evaluating is all that is needed to test a candidate solution. They do not require an answer key or input/output examples, nor do they depend on natural language understanding.

    The dataset is comprehensive in that it spans problems of a range of difficulties and domains, ranging from trivial string manipulation problems that are immediately obvious to human programmers (but not necessarily to AI), to classic programming puzzles (e.g., Towers of Hanoi), to interview/competitive-programming problems (e.g., dynamic programming), to longstanding open problems in algorithms and mathematics (e.g., factoring). The objective nature of P3 readily supports self-supervised bootstrapping.

  12. Programming languages used for software development worldwide 2022

    • statista.com
    Updated Feb 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Programming languages used for software development worldwide 2022 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
    Explore at:
    Dataset updated
    Feb 20, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    Worldwide
    Description

    The most popular programming language used in the last 12 months by software developers worldwide is JavaScript as of 2022, according to 65 percent of the software developers surveyed. Four percent of software developers are also planning to adopt or migrate to JavaScript.

  13. o

    Start Coding without Hesitation: Programming Languages Showdown

    • explore.openaire.eu
    Updated Jan 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Weisi Chen; Aidan Wilson; Dr Anastasios Papaioannou (2019). Start Coding without Hesitation: Programming Languages Showdown [Dataset]. http://doi.org/10.5281/zenodo.6423515
    Explore at:
    Dataset updated
    Jan 1, 2019
    Authors
    Dr Weisi Chen; Aidan Wilson; Dr Anastasios Papaioannou
    Description

    About this webinar Programming is becoming more and more popular, with many researchers using programming to perform data cleaning, data manipulation, data analytics, as well as creating publication quality plots. Programming can be really beneficial for automating processes and workflows. In this webinar, we are exploring four of the most popular programming languages that are widely used in academia, namely Python, R, MATLAB, and Julia. Webinar Topics Why use Programming An overview of Python, R, MATLAB, and Julia Code comparison of the four programming languages Popularity and job opportunities Intersect’s comparison General guidelines on how to choose the best programming language for your research Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.

  14. k

    Codeforces-Competitive-Programming-Dataset

    • kaggle.com
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Codeforces-Competitive-Programming-Dataset [Dataset]. https://www.kaggle.com/datasets/dinuiongeorge/codeforces-competitive-programming-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Competitive programming is a challenging task that demands proficiency in computer science concepts and strong problem-solving skills.

    A significant limitation in the field of competitive programming, in the context of machine learning, is the lack of available datasets that include the problem statement, the editorial, and the source code for research purposes. This limitation hinders the development of new algorithms and techniques to improve the efficiency and accuracy of selecting or creating suitable editorials for given problems.

    To address this problem, we have introduced a comprehensive series of 1550 competitive programming problems that encompass both editorial solutions and source code.

  15. m

    Data from: A Decision Model for Programming LanguageEcosystem Selection:...

    • data.mendeley.com
    • commons.datacite.org
    Updated Nov 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siamak Farshidi (2020). A Decision Model for Programming LanguageEcosystem Selection: Seven Industry Case Studies [Dataset]. http://doi.org/10.17632/5tc6v6zkzf.1
    Explore at:
    Dataset updated
    Nov 1, 2020
    Authors
    Siamak Farshidi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Software development is a continuous decision-making process that mainly relies on the software engineer's experience and intuition. One of the essential decisions in the early stages of the process is selecting the best fitting programming language based on the project requirements. A significant number of criteria, such as developer availability and consistent documentation, besides potential programming languages in the market, lead to a challenging decision-making process. A decision model is required to analyze the selection problem using systematic identification and evaluation of potential alternatives for a development project. Method: Recently, we introduced a framework to build decision models for technology selection problems in software production. Furthermore, we designed and implemented a decision support system that uses such decision models to support software engineers with their decision-making problems. This study presents a decision model based on the framework for the programming language selection problem. Results: The decision model has been evaluated through seven real-world case studies at seven software development companies. The case study participants declared that the approach provides significantly more insight into the programming language selection process and decreases the decision-making process's time and cost. Conclusion: With the knowledge available through the decision model, software engineers can more rapidly evaluate programming languages. Having this knowledge readily available supports software engineers in making more efficient and effective decisions that meet their requirements and priorities.

  16. o

    EMIP: The eye movements in programming dataset

    • osf.io
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Bednarik (2021). EMIP: The eye movements in programming dataset [Dataset]. https://osf.io/53kts
    Explore at:
    Dataset updated
    Sep 3, 2021
    Dataset provided by
    Center For Open Science
    Authors
    Roman Bednarik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A large dataset that contains the eye movements of N=216 programmers of different experience levels captured during two code comprehension tasks is presented. Data are grouped in terms of programming expertise (from none to high) and other demographic descriptors. Data were collected through an international collaborative effort that involved eleven research teams across eight countries on four continents. The same eye tracking apparatus and software was used for the data collection. The Eye Movements in Programming (EMIP) dataset is freely available for download. The varied metadata in the EMIP dataset provides fertile ground for the analysis of gaze behavior and may be used to make novel insights about code comprehension.

    Bednarik, Roman, et al. "EMIP: The eye movements in programming dataset." Science of Computer Programming 198 (2020): 102520.

  17. o

    Lost in the Code?

    • explore.openaire.eu
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Eduardo Muñoz (2023). Lost in the Code? [Dataset]. http://doi.org/10.5281/zenodo.7589898
    Explore at:
    Dataset updated
    Mar 13, 2023
    Authors
    Luis Eduardo Muñoz
    Description

    The community behind R is built by inspired scientists that share their tools and knowledge freely to encourage equal access for all aspiring researchers and championing academic integrity. The tools available through R aid in every step of data analysis; including creating experiments, cataloging and organizing data, analyzing the results, and visualizing our findings all in one software environment. The power of programming also increases the flexibility and automation of these tasks saving an abundance of time and ensuring each step can be accurately reproduced. Often, courses that use the R software to demonstrate statistical concepts face the dual challenge of introducing two distinct and equally intricate topics at once; programming and statistics. In most cases, the focus must be shifted away from programming due to constraints on time and breadth to the potential confusion and dismay (repeated appearance of error messages) of novice learners in statistics. This workshop aims to provide a solid foundation of programming concepts such that attendees can confidently approach more advanced statistical courses or independently improve their statistical skills. Many of the ideas that will be covered can apply to many different programming languages, despite R being the main tool. Online recordings. Part 1: https://youtu.be/3zUkPvYTePo Part 2: https://youtu.be/Knjbu6JwNI0

  18. P

    CodeQA Dataset

    • paperswithcode.com
    Updated Sep 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chenxiao Liu; Xiaojun Wan (2021). CodeQA Dataset [Dataset]. https://paperswithcode.com/dataset/codeqa
    Explore at:
    Dataset updated
    Sep 16, 2021
    Authors
    Chenxiao Liu; Xiaojun Wan
    Description

    CodeQA is a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.

    Description from: CodeQA: A Question Answering Dataset for Source Code Comprehension

  19. Global Programming Software Market Size By Product Type (Cloud Based,...

    • verifiedmarketresearch.com
    Updated Sep 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2020). Global Programming Software Market Size By Product Type (Cloud Based, On-Premise), By Application (Large Enterprise, SMEs), By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/programming-software-market/
    Explore at:
    Dataset updated
    Sep 3, 2020
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Programming Software Market size was valued at USD 30.9 Billion in 2024 and is projected to reach USD 147.8 Billion by 2031, growing at a CAGR of 23.4% during the forecast period 2024-2031.

    Global Programming Software Market Drivers

    Technological Innovation: The market for programming software is primarily driven by technological advancements. The landscape is always changing due to advancements in programming languages, frameworks, and tools, which empower developers to produce increasingly complex and effective software solutions. The need for programming software that supports cutting-edge technologies like cloud computing, AI, machine learning, and the Internet of Things (IoT) is increasing.

    Growing Need for Customized Solutions: Companies in a variety of sectors are depending more and more on software solutions made to meet their unique requirements. The need for programming tools that makes it possible for developers to quickly and effectively create highly customized apps is driven by this desire. The market is becoming more and more competitive, and this is driving up demand for programming tools that are both versatile and scalable.

    Move Towards Open Source Software: Due to its affordability, adaptability, and collaborative nature, open source software has seen a sharp increase in popularity in recent years. Because of its accessibility and active community support, open source programming software is preferred by many developers and organizations. As a result, open source tools and frameworks are becoming more popular in the programming software market.

    The use of DevOps principles, which prioritize cooperation between development and operations teams to expedite software delivery, is on the rise. These practices are being embraced by enterprises looking to increase their efficiency and agility. Programming software that enables smooth integration, automation, and continuous delivery inside the DevOps pipeline is in high demand due to this trend.

    A Growing Focus on Security: As a result of the increase in cyberattacks and data leaks, security is now the top priority for businesses creating software solutions. Because of this, there is an increasing need for programming tools that support safe coding techniques and have strong security features. Programming frameworks and tools with a security focus are necessary to fix vulnerabilities and guarantee the integrity of software programs.

    Transition to No-Code/Low-Code Development:
    Because low-code/no-code development platforms make it possible for users with different degrees of technical expertise to construct apps quickly, they are democratizing software development. The demand for increased agility, lower development costs, and a quicker time to market is what’s driving this trend. Consequently, low-code/no-code tools are becoming more and more popular in the programming software market alongside conventional programming languages and frameworks.

    Industry-Specific Requirements: The selection of programming software is influenced by the particular requirements and regulatory norms of various industries. Industry-specific standards and regulatory compliance are made easier by the need for programming tools in areas like finance, healthcare, and automotive, which have strict compliance requirements.

    Global Economic variables: The market for programming software is also impacted by economic variables like GDP growth, investment trends, and geopolitical developments. While economic expansion can lead to higher investment in software development activities, economic downturns may result in reduced IT budgets and slower adoption of new technology.

  20. GitHub Code Snippets

    • kaggle.com
    zip
    Updated Mar 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zomglings (2021). GitHub Code Snippets [Dataset]. https://www.kaggle.com/simiotic/github-code-snippets
    Explore at:
    zip(7532656956 bytes)Available download formats
    Dataset updated
    Mar 3, 2021
    Authors
    zomglings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains over 97,000,000 snippets of code from various GitHub repositories with more than 10,000 stars.

    The repositories included in this dataset were the results of searching for repositories with greater than 10,000 stars for each of the following languages:

    Bash
    C
    C++
    CSV
    DOTFILE
    Go
    HTML
    JSON
    Java
    JavaScript
    Jupyter
    Markdown
    PowerShell
    Python
    Ruby
    Rust
    Shell
    TSV
    Text
    UNKNOWN
    YAML
    

    For each repository, we created snippets from the default branch by going through each text file and extracting 5-line chunks of text every 5 lines.

    We used file extensions to associate snippets with the programming language they most likely represent. For snippets for which we could not infer the language from the file extension, we use the value UNKNOWN in the language column.

    This dataset does not contain code from any GitHub repository without a license. The following is the list of possible licenses a snippet can be associated with: AGPL-3.0 Apache-2.0 BSD-2-Clause BSD-3-Clause BSL-1.0 CC-BY-4.0 CC-BY-SA-4.0 CC0-1.0 GPL-2.0 GPL-3.0 ISC LGPL-2.1 LGPL-3.0 MIT MPL-2.0 MS-PL NOASSERTION OFL-1.1 Unlicense WTFPL Zlib

    These are SPDX License Identifiers.

    Note that Unlicense refers to the Unlicense. It does not mean that the snippet is unlicensed.

    Issues and requests

    This dataset is built and maintained by Bugout.dev. To report an issue with the data, to request changes in future versions of the dataset, please open a discussion thread..

    Development dataset

    As this dataset can be difficult to work with in Kaggle notebooks, we have made a smaller version of the dataset available, as well: GitHub Code Snippets - Development sample.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). CodeContests Dataset [Dataset]. https://paperswithcode.com/dataset/codecontests

CodeContests Dataset

Explore at:
97 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 25, 2023
Description

CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.

It consists of programming problems, from a variety of sources.

Problems include test cases in the form of paired inputs and outputs, as well as both correct and incorrect human solutions in a variety of languages.

Search
Clear search
Close search
Google apps
Main menu