70 datasets found
  1. Most used programming languages among developers worldwide 2024

    • statista.com
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 19, 2024 - Jun 20, 2024
    Area covered
    Worldwide
    Description

    As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  2. Programming languages used for software development worldwide 2024

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages used for software development worldwide 2024 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.

  3. Most popular programming languages worldwide 2024

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages worldwide 2024 [Dataset]. https://www.statista.com/statistics/1292294/popular-it-skills-worldwide/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Jun 30, 2024
    Area covered
    Worldwide
    Description

    JavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.

  4. t

    Programming Language Ecosystem Project TU Wien

    • test.researchdata.tuwien.ac.at
    csv, text/markdown
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
    Explore at:
    text/markdown, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Time period covered
    Dec 12, 2023
    Area covered
    Vienna
    Description

    About Dataset

    This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.

    The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

    About Data collection methodology

    The dataset was created using the github repository above. As input data, three public datasets where used.

    github_metadata

    Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.

    PYPL_survey_2004-2023

    Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.

    stack_overflow_developer_survey

    Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.

    All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

    Description of the data

    The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

    The languages that are going to be considered for the project can be seen here:

    - Python

    - C

    - C++

    - Java

    - C#

    - JavaScript

    - PHP

    - SQL

    - Assembly

    - Scratch

    - Fortran

    - Go

    - Kotlin

    - Delphi

    - Swift

    - Rust

    - Ruby

    - R

    - COBOL

    - F#

    - Perl

    - TypeScript

    - Haskell

    - Scala

    License

    This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.

    TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

    Acknowledgments

    Thanks go out to

    - stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.

    - the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.

    - Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.

  5. Most Popular Programming Languages Since 2004

    • kaggle.com
    zip
    Updated Aug 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Khalid (2020). Most Popular Programming Languages Since 2004 [Dataset]. https://www.kaggle.com/muhammadkhalid/most-popular-programming-languages-since-2004
    Explore at:
    zip(11311 bytes)Available download formats
    Dataset updated
    Aug 15, 2020
    Authors
    Muhammad Khalid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    Well, I was looking for a Most Popular Programming Languages dataset for my YouTube channel video and couldn't find anything decent. So, I collect it for my use and share it.

    Content

    This dataset contains data about the Most Popular Programming Languages from 2004 to 2020. All Programming Languages values is in percentage form out of 100 %

    Acknowledgements

    The data was pulled from https://pypl.github.io

    If this dataset is useful for you then don't forget to upvote.

  6. Programming languages most used in software companies in Russia 2024

    • statista.com
    Updated Jun 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages most used in software companies in Russia 2024 [Dataset]. https://www.statista.com/statistics/1196588/programming-languages-most-used-russia/
    Explore at:
    Dataset updated
    Jun 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Russia
    Description

    JavaScript was the most frequently used coding language in Russia, used by around ********** of the surveyed software companies in 2024. Furthermore, over ******** of the companies reported to use Python and Java.

  7. f

    Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    figshare
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  8. Z

    Programing language & Games

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han, Qi (2020). Programing language & Games [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3549143
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Han, Qi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the development of science and technology, there are more and more electronic games on the market. The types of electronic games have also become more diversified. At present, there are many programming languages on the market that can be used to develop games. As a beginner of game development, it is difficult for us to choose an appropriate programming language to develop specific types of games. So we investigate some famous game and the programing languages they use.

  9. Leading coding languages used in AR and VR worldwide in 2022

    • statista.com
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading coding languages used in AR and VR worldwide in 2022 [Dataset]. https://www.statista.com/statistics/1343292/coding-languages-used-in-ar-vr-worldwide/
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2021 - Feb 2022
    Area covered
    Worldwide
    Description

    A survey conducted between late 2021 and early 2022 found that JavaScript was the leading coding language used by software developers in augmented reality (AR) and virtual reality (VR) projects, followed closely by Python.

  10. libs-github-api: add summary stats

    • zenodo.org
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Phetteplace; Eric Phetteplace (2020). libs-github-api: add summary stats [Dataset]. http://doi.org/10.5281/zenodo.17790
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eric Phetteplace; Eric Phetteplace
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    new lib/summary-stats.js script compiles a sorted CSV of all languages used with data on the number of repos the language appears in, the number for which it is recorded as the primary language, and the total lines of code in the language across all repos.

  11. Replication Package of the paper "Large Language Models for Multilingual...

    • zenodo.org
    zip
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2025). Replication Package of the paper "Large Language Models for Multilingual Code Generation: A Benchmark and a Study on Code Quality" [Dataset]. http://doi.org/10.5281/zenodo.15028641
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large Language Models for Multilingual Code Generation: A Benchmark and a Study on Code Quality

    Abstract

    Having been trained in the wild, Large Language Models (LLMs) may suffer from different types of bias. As shown in previous studies outside software engineering, this includes a language bias, i.e., these models perform differently depending on the language used for the query/prompt. However, so far the impact of language bias on source code generation has not been thoroughly investigated. Therefore, in this paper, we study the influence of the language adopted in the prompt on the quality of the source code generated by three LLMs, specifically GPT, Claude, and DeepSeek. We consider 230 coding tasks for Python and 230 for Java, and translate their related prompts into four languages: Chinese, Hindi, Spanish, and Italian. After generating the code, we measure code quality in terms of passed tests, code metrics, warnings generated by static analysis tools, and language used for the identifiers. Results indicate that (i) source code generated from the English queries is not necessarily better in terms of passed test and quality metrics, (ii) the quality for different languages varies depending on the programming language and LLM being used, and (iii) the generated code tend to contain mixes of comments and literals written in English and the language used to formulate the prompt.

    Replication Package

    This replication package is organized into two main directories: data and scripts. The datadirectory contains all the data used in the analysis, including prompts and final results. The scripts directory contains all the Python scripts used for code generation and analysis.

    Data

    The data directory contains five subdirectories, each corresponding to a stage in the analysis pipeline. These are enumerated to reflect the order of the process:

    1. prompt_translation: Contains files with manually translated prompts for each language. Each file is associated with both Python and Java. The structure of each file is as follows:

      • id: The ID of the query in the CoderEval benchmark.
      • prompt: The original English prompt.
      • summary: The original summary.
      • code: The original code.
      • translation: The translation generated by GPT.
      • correction: The manual correction of the GPT-generated translation.
      • correction_tag: A list of tags indicating the corrections made to the translation.
      • generated_code: This column is initially empty and will contain the code generated from the translated prompt.
    2. generation: Contains the code generated by the three LLMs for each programming language and natural language. Each subdirectory (e.g., java_chinese_claude) contains the following:

      • files: The files with the generated code (named by the query ID).
      • report: Reports generated by static analysis tools.
      • A CSV file (e.g., java_chinese_claude.csv) containing the generated code in the corresponding column.
    3. tests: Contains input files for the testing process and the results of the tests. Files in the input_files directory are formatted according to the CoderEval benchmark requirements. The results directory holds the output of the testing process.

    4. quantitative_analysis: Contains all the csv reports of the static analysis tools and test output for all languages and models. These files are the inputs for the statistical analysis. The directory stats contains all the output tables for the statistical analysis, which are shown in paper's tables.

    5. qualitative_analysis: Contains files used for the qualitative analysis:

      • CohenKappaagreement.csv: A file containing the subset used to compute Cohen's kappa metrics for manual analysis.
      • files: Contains all files for the qualitative analysis. Each file has the following columns:
        • id: The ID of the query in the CoderEval benchmark.
        • generated_code: The code generated by the model.
        • comments: The language used for comments.
        • identifiers: The language used for identifiers.
        • literals: The language used for literals.
        • notes: Additional notes.
    6. ablation_study: Contains files for the ablation study. Each file has the following columns:

      • id: The ID of the query in the CoderEval benchmark.
      • prompt: The prompt used for code generation.
      • generated_code, comments, identifiers, and literals: Same as in the qualitative analysis. results.pdf: This file shows the table containing all the percentages of comments, identifiers and literals extracted from the csv files of the ablation study.

      Files prefixed with italian contain prompts with signatures and docstrings translated into Italian. The system prompt used is the same as the initial one (see the paper). Files with the english prefix have prompts with the original signature (in English) and the docstring in Italian. The system prompt differs as follows:

    You are an AI that only responds with Python code. You will be given a function signature and its docstring by the user. Write your full implementation (restate the function signature).
    Use a Python code block to write your response.
    Comments and identifiers must be in Italian. 
    For example:
    ```python
    print("Hello World!")

    Scripts

    The scripts directory contains all the scripts used to perform all the generations and analysis. All files are properly commented. Here a brief description of each file:

    • code_generation.py: This script automates code generation using AI models (GPT, DeepSeek, and Claude) for different programming and natural languages. It reads prompts from CSV files, generates code based on the prompts, and saves the results in structured directories. It logs the process, handles errors, and stores the generated code in separate files for each iteration.

    • computeallanalysis.py: This script performs static code analysis on generated code files using different models, languages, and programming languages. It runs various analyses (Flake8, Pylint, Lizard) depending on the programming language: for Python, it runs all three analyses, while for Java, only Lizard is executed. The results are stored in dedicated report directories for each iteration. The script ensures the creation of necessary directories and handles any errors that occur during the analysis process.

    • createtestjava.py: This script processes Java code generated by different models and languages, extracting methods using a JavaParser server. It iterates through multiple iterations of generated code, extracts the relevant method code (or uses the full code if no method is found), and stores the results in a JSONL file for each language and model combination.

    • deepseek_model.py: This function sends a request to the DeepSeek API, passing a system and user prompt, and extracts the generated code snippet based on the specified programming language. It prints the extracted code in blue to the console, and if any errors occur during the request or extraction, it prints an error message in red. If successful, it returns the extracted code snippet; otherwise, it returns None.

    • extractpmdreport.py: This script processes PMD analysis reports in SARIF format and converts them into CSV files. It extracts the contents of ZIP files containing the PMD reports, parses the SARIF file to gather analysis results, and saves the findings in a CSV file. The output includes details such as file names, rules, messages, and the count of issues found. The script iterates through multiple languages, models, and iterations, ensuring that PMD reports are properly processed and saved for each combination.

    • flake_analysis.py: The flake_analysis function runs Flake8 to analyze Python files for errors and generates a CSV report summarizing the results. It processes the output, extracting error details such as filenames, error codes, and messages. The errors are grouped by file and saved in a CSV file for easy review.

    • generatepredictionclaude_java.py: The generatecodefrom_prompt function processes a JSON file containing prompts, generates Java code using the Claude API, and saves the generated code to a new JSON file. It validates each prompt, ensures it's JSON-serializable, and sends it to the Claude API for code generation. If the generation is successful, the code is stored in a structured format, and the output is saved to a JSON file for further use.

    • generatepredictionclaude_python.py: This code defines a function generatecodefrom_prompt that processes a JSON file containing prompts, generates Python code using the Claude API, and saves the generated code to a new JSON file. It handles invalid values and ensures all prompts are

  12. Top programming languages demanded by recruiters worldwide 2025

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top programming languages demanded by recruiters worldwide 2025 [Dataset]. https://www.statista.com/statistics/1296727/programming-languages-demanded-by-recruiters/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most demanded programming languages by recruiters in 2025 were Python, JavaScript, and Java, with around ** percent of recruiters looking to hire people with these programming skills.

  13. g

    Development Economics Data Group - Proportion of youth and adults who have...

    • gimi9.com
    Updated Dec 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Development Economics Data Group - Proportion of youth and adults who have wrote a computer program using a specialised programming language, both sexes (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_wb_edstats_uis_ictskillproglang/
    Explore at:
    Dataset updated
    Dec 12, 2020
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The proportion of youth and adults with information and communications technology (ICT) skills, by type of skill as defined as the percentage of individuals that have undertaken certain ICT-related activities in the last 3 months. The lack of ICT skills continues to be one of the key barriers keeping people from fully benefitting from the potential of ICT. These data may be used to inform targeted policies to improve ICT skills, and thus contribute to an inclusive information society. The data compiler for this indicator is the International Telecommunication Union (ITU). Eurostat collects data annually for 32 European countries, while the ITU is responsible for setting up the standards and collecting this information from the remaining countries.

  14. Z

    Data from: Code4Bench: A Multidimensional Benchmark of Codeforces Data for...

    • data.niaid.nih.gov
    • search.datacite.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vahidi-Asl Mojtaba (2020). Code4Bench: A Multidimensional Benchmark of Codeforces Data for Different Program Analysis Techniques [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2582967
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Majd Amirabbas
    Zamani Bahman
    Baraani-Dastjerdi Ahmad
    Khalilian Alireza
    Vahidi-Asl Mojtaba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reproducible research relies on well-designed benchmarks. However, evaluation on a single benchmark increases the risk of overfitting; that is, an optimization to reach a certain performance. In recent years several well-designed benchmarks have been constructed for different subfields of program analysis. However, they often involve real-world industrial projects in few languages such as C or Java. We provide Code4Bench, a benchmark comprising 3,421,357 programs totaling of 306,053,105 lines of code in 41 versions of 28 programming languages such as C/C++, Java, Python, and Kotlin. We have constructed this benchmark from Codeforces, a famous programming competition website, which is widely used by international programmers. Code4Bench advances the state-of-the-art in conducting reproducible and comparative experiments. It helps mitigate the bias and increase the generality and conclusiveness of the results. We present our methodology in construction of Code4Bench and give various descriptive statistics. We have also conducted an online survey on the users of Codeforces’ website whose code is included in the benchmark. The survey is concerned about the user’s demographic information and programming habits, whose results are also provided in the benchmark. Finally, we leveraged an automatic process by which we localized faults within the faulty versions and categorize them according to a coarse-grained classification. In addition to its usage in empirical studies, Code4Bench can be used to teach programming and evolve algorithmic problems. We release Code4Bench in database format to allow researchers to extract other data of the benchmark by arbitrary queries.

    Code4Bench version 1.0.0 is publicly available at https://zenodo.org/record/2582968, with DOI 10.5281/zenodo.2582968, thereby providing long-term storage and versioning. It is released under the terms of Creative Commons Attribution 4.0 International license. Code4Bench is also publicly available at: https://github.com/code4bench/Code4Bench, in which we have provided some additional information and script examples.

  15. m

    Data from: T3PS v1.0: Tool for Parallel Processing in Parameter Scans

    • data.mendeley.com
    Updated Jan 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinzenz Maurer (2016). T3PS v1.0: Tool for Parallel Processing in Parameter Scans [Dataset]. http://doi.org/10.17632/7cd59f5dhh.1
    Explore at:
    Dataset updated
    Jan 1, 2016
    Authors
    Vinzenz Maurer
    License

    http://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html

    Description

    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2018)

    Abstract T3PS is a program that can be used to quickly design and perform parameter scans while easily taking advantage of the multi-core architecture of current processors. It takes an easy to read and write parameter scan definition file format as input. Based on the parameter ranges and other options contained therein, it distributes the calculation of the parameter space over multiple processes and possibly computers. The derived data is saved in a plain text file format readable by most plotting ...

    Title of program: T3PS Catalogue Id: AEXZ_v1_0

    Nature of problem While current processor architecture firmly goes the way of parallelization even on desktop computers, programs commonly used for parameter scans in physics often lack the capability to take advantage of this. While it is possible to change the source code of some programs, it may not be feasible for every program still in use. Fortunately, current operating system routinely make use of multiple processor cores already, if multiple processes are running at the same time. The easiest way to make ...

    Versions of this program held in the CPC repository in Mendeley Data AEXZ_v1_0; T3PS; 10.1016/j.cpc.2015.08.032

  16. D

    Replication Data for: Compositio Prompto: An Architecture to Employ Large...

    • darus.uni-stuttgart.de
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin D. Pesl; Carolin Mombrey; Kevin Klein; Ilche Georgievski; Steffen Becker; Georg Herzwurm; Marco Aiello (2024). Replication Data for: Compositio Prompto: An Architecture to Employ Large Language Models in Automated Service Computing [Dataset]. http://doi.org/10.18419/DARUS-4497
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 7, 2024
    Dataset provided by
    DaRUS
    Authors
    Robin D. Pesl; Carolin Mombrey; Kevin Klein; Ilche Georgievski; Steffen Becker; Georg Herzwurm; Marco Aiello
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4497https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4497

    Dataset funded by
    BMWK
    MWK
    Description

    A classic, central Service-Oriented Computing (SOC) challenge is the service composition problem. It concerns solving a user-defined task by selecting a suitable set of services, possibly found at runtime, determining an invocation order, and handling request and response parameters. The solutions proposed in the past two decades mostly resort to additional formal modeling of the services, leading to extra effort, scalability issues, and overall brittleness. With the rise of Large Language Models (LLMs), it has become feasible to process semistructured information like state-of-practice OpenAPI documentation containing formal parts like endpoints and free-form elements like descriptions. We propose Compositio Prompto to generate service compositions based on those semi-structured documents. Compositio Prompto acts as an encapsulation of the prompt creation and the model invocation such that the user only has to provide the service specifications, the task, and which input and output format they expect, eliminating any manual and laborious annotation or modeling task by relying on already existing documentation. To validate our approach, we implement a fully operational prototype, which operates on a set of OpenAPIs, a plain text task, and an input and output JSON schema as input and returns the generated service composition as executable Python code. We measure the effectiveness of our approach on a parking spot booking case study. Our experiments show that models can solve several tasks, especially those above 70B parameters, but none can fulfill all tasks. Furthermore, compared with manually created sample solutions, the ones generated by LLMs appear to be close approximations. Methodology (summarized): We perform an automated service composition for parking spot booking using LLMs for the study. There are six parking services and two payment services. The six parking services are duplicated with different distances and prices to create distinct sets 1 and 2. We define eight prompts and perform the composition using 14 different LLMs. We use a best-of-three-shot approach to reduce the influence of randomness. Finally, we assess functionality manually and apply code similarity metrics to a manually crafted sample solution. All experiments are described in detail in the full paper. Content: code/*:Code to perform the experiments. For details, see "code/README.md". code/evaluation/prompt_generation.py:Services and prompt generation. code/evaluation/sample_solution/*:Sample solution and similarity evaluation. results/*:Results for the runs with the LLMs. Filename structure: "results/{model_name}-{run}/prompt_{prompt_number}_set_{set_number}_{artifact}". The artifact can be "code_0.py" for the generated code, "code_1.py" if tasked to improve the code, or "prompt.txt" for the used prompt. For the best run, the code metrics are in "comparison.json". prompt_template.txt:Pseudo code for the prompt template. Implemented in code/evaluation/prompt_generation.py. Note: Please use the tree view to access the files. License: License for the "code/*": MIT. License for the "results/*": CC BY 4.0.

  17. Monthly Stack Overflow Questions

    • kaggle.com
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ComputingVictor (2024). Monthly Stack Overflow Questions [Dataset]. https://www.kaggle.com/datasets/computingvictor/monthly-trends-in-stack-overflow-questions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ComputingVictor
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description

    This dataset contains information on the popularity and interest in various programming languages over time, as observed through the total number of questions asked on Stack Overflow from 2008 to 2024. The dataset provides insights into the evolving trends and dynamics within the programming landscape, reflecting the changing interests and preferences of developers worldwide.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12042033%2Fddafe2cd1e30bb052310594f3d1d0c93%2FStackOverflowQuestions-ezgif.com-video-to-gif-converter.gif?generation=1711060971699770&alt=media" alt="">

    Attributes:

    • Month: The month & year in which the data was recorded.
    • Programming Language: The name of the programming language.
    • Total Questions: The total number of questions asked on Stack Overflow related to the specific programming language during the given year.

    Potential Uses of the Dataset:

    This dataset enables researchers, analysts, and enthusiasts to explore the historical trajectory of programming language popularity, identify emerging trends, and gain valuable insights into the factors influencing developers' preferences and choices over time.

  18. Proficiency of computer programming in China 2024

    • statista.com
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Proficiency of computer programming in China 2024 [Dataset]. https://www.statista.com/statistics/1537789/china-proficiency-with-programming-languages-and-coding/
    Explore at:
    Dataset updated
    Nov 29, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2024
    Area covered
    China
    Description

    Driven by a rapid digitization process and a tech-savvy culture, a considerable number of China's internet population possessed advanced computer skills. A 2024 revealed that about one in every five internet users in the country owned basic coding knowledge using programming languages.

  19. p

    Waltham Public Schools Dual Language Program

    • publicschoolreview.com
    json, xml
    Updated Jun 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public School Review (2025). Waltham Public Schools Dual Language Program [Dataset]. https://www.publicschoolreview.com/waltham-public-schools-dual-language-program-profile
    Explore at:
    xml, jsonAvailable download formats
    Dataset updated
    Jun 22, 2025
    Dataset authored and provided by
    Public School Review
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2017 - Dec 31, 2025
    Area covered
    Waltham School District, Waltham
    Description

    Historical Dataset of Waltham Public Schools Dual Language Program is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2017-2023),Total Classroom Teachers Trends Over Years (2017-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2017-2023),Asian Student Percentage Comparison Over Years (2017-2020),Hispanic Student Percentage Comparison Over Years (2017-2023),Black Student Percentage Comparison Over Years (2017-2023),White Student Percentage Comparison Over Years (2017-2023),Two or More Races Student Percentage Comparison Over Years (2017-2023),Diversity Score Comparison Over Years (2017-2023),Reading and Language Arts Proficiency Comparison Over Years (2021-2022),Math Proficiency Comparison Over Years (2021-2023),Overall School Rank Trends Over Years (2021-2023)

  20. F# Data: Making structured data first-class

    • figshare.com
    bin
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tomas Petricek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
Organization logo

Most used programming languages among developers worldwide 2024

Explore at:
85 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 6, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 19, 2024 - Jun 20, 2024
Area covered
Worldwide
Description

As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

Search
Clear search
Close search
Google apps
Main menu