70 datasets found

Most used programming languages among developers worldwide 2024
statista.com
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
Explore at:
Dataset updated
Feb 6, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 19, 2024 - Jun 20, 2024
Area covered
Worldwide
Description
As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.
Programming languages used for software development worldwide 2024
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Programming languages used for software development worldwide 2024 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
The most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.
Most popular programming languages worldwide 2024
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular programming languages worldwide 2024 [Dataset]. https://www.statista.com/statistics/1292294/popular-it-skills-worldwide/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 1, 2024 - Jun 30, 2024
Area covered
Worldwide
Description
JavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.
t
Programming Language Ecosystem Project TU Wien
test.researchdata.tuwien.ac.at
csv, text/markdown
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
Explore at:
text/markdown, csvAvailable download formats
Unique identifier
https://doi.org/10.70124/gnbse-ts649
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Time period covered
Dec 12, 2023
Area covered
Vienna
Description
About Dataset
This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.
The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

About Data collection methodology
The dataset was created using the github repository above. As input data, three public datasets where used.
github_metadata
Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.
PYPL_survey_2004-2023
Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.
stack_overflow_developer_survey
Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.
All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

Description of the data
The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

The languages that are going to be considered for the project can be seen here:
- Python
- C
- C++
- Java
- C#
- JavaScript
- PHP
- SQL
- Assembly
- Scratch
- Fortran
- Go
- Kotlin
- Delphi
- Swift
- Rust
- Ruby
- R
- COBOL
- F#
- Perl
- TypeScript
- Haskell
- Scala

License
This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.
TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

Acknowledgments
Thanks go out to
- stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
- the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
- Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.
Most Popular Programming Languages Since 2004
kaggle.com
zip
Updated Aug 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Khalid (2020). Most Popular Programming Languages Since 2004 [Dataset]. https://www.kaggle.com/muhammadkhalid/most-popular-programming-languages-since-2004
Explore at:
zip(11311 bytes)Available download formats
Dataset updated
Aug 15, 2020
Authors
Muhammad Khalid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

Well, I was looking for a Most Popular Programming Languages dataset for my YouTube channel video and couldn't find anything decent. So, I collect it for my use and share it.

Content

This dataset contains data about the Most Popular Programming Languages from 2004 to 2020. All Programming Languages values is in percentage form out of 100 %

Acknowledgements

The data was pulled from https://pypl.github.io

If this dataset is useful for you then don't forget to upvote.
Programming languages most used in software companies in Russia 2024
statista.com
Updated Jun 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Programming languages most used in software companies in Russia 2024 [Dataset]. https://www.statista.com/statistics/1196588/programming-languages-most-used-russia/
Explore at:
Dataset updated
Jun 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Russia
Description
JavaScript was the most frequently used coding language in Russia, used by around ********** of the surveyed software companies in 2024. Furthermore, over ******** of the companies reported to use Python and Java.
f
Collection of example datasets used for the book - R Programming -...
figshare.com
txt
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24728073.v1
Dataset updated
Dec 4, 2023
Dataset provided by
figshare
Authors
Kingsley Okoye; Samira Hosseini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.
Z
Programing language & Games
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Han, Qi (2020). Programing language & Games [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3549143
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Han, Qi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the development of science and technology, there are more and more electronic games on the market. The types of electronic games have also become more diversified. At present, there are many programming languages on the market that can be used to develop games. As a beginner of game development, it is difficult for us to choose an appropriate programming language to develop specific types of games. So we investigate some famous game and the programing languages they use.
Leading coding languages used in AR and VR worldwide in 2022
statista.com
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading coding languages used in AR and VR worldwide in 2022 [Dataset]. https://www.statista.com/statistics/1343292/coding-languages-used-in-ar-vr-worldwide/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2021 - Feb 2022
Area covered
Worldwide
Description
A survey conducted between late 2021 and early 2022 found that JavaScript was the leading coding language used by software developers in augmented reality (AR) and virtual reality (VR) projects, followed closely by Python.
libs-github-api: add summary stats
zenodo.org
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Phetteplace; Eric Phetteplace (2020). libs-github-api: add summary stats [Dataset]. http://doi.org/10.5281/zenodo.17790
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17790
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Eric Phetteplace; Eric Phetteplace
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
new lib/summary-stats.js script compiles a sorted CSV of all languages used with data on the number of repos the language appears in, the number for which it is recorded as the primary language, and the total lines of code in the language across all repos.
Replication Package of the paper "Large Language Models for Multilingual...
zenodo.org
zip
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2025). Replication Package of the paper "Large Language Models for Multilingual Code Generation: A Benchmark and a Study on Code Quality" [Dataset]. http://doi.org/10.5281/zenodo.15028641
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15028641
Dataset updated
Mar 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Large Language Models for Multilingual Code Generation: A Benchmark and a Study on Code Quality

Abstract

Having been trained in the wild, Large Language Models (LLMs) may suffer from different types of bias. As shown in previous studies outside software engineering, this includes a language bias, i.e., these models perform differently depending on the language used for the query/prompt. However, so far the impact of language bias on source code generation has not been thoroughly investigated. Therefore, in this paper, we study the influence of the language adopted in the prompt on the quality of the source code generated by three LLMs, specifically GPT, Claude, and DeepSeek. We consider 230 coding tasks for Python and 230 for Java, and translate their related prompts into four languages: Chinese, Hindi, Spanish, and Italian. After generating the code, we measure code quality in terms of passed tests, code metrics, warnings generated by static analysis tools, and language used for the identifiers. Results indicate that (i) source code generated from the English queries is not necessarily better in terms of passed test and quality metrics, (ii) the quality for different languages varies depending on the programming language and LLM being used, and (iii) the generated code tend to contain mixes of comments and literals written in English and the language used to formulate the prompt.

Replication Package

This replication package is organized into two main directories: data and scripts. The datadirectory contains all the data used in the analysis, including prompts and final results. The scripts directory contains all the Python scripts used for code generation and analysis.

Data

The data directory contains five subdirectories, each corresponding to a stage in the analysis pipeline. These are enumerated to reflect the order of the process:

prompt_translation: Contains files with manually translated prompts for each language. Each file is associated with both Python and Java. The structure of each file is as follows:

id: The ID of the query in the CoderEval benchmark.

prompt: The original English prompt.

summary: The original summary.

code: The original code.

translation: The translation generated by GPT.

correction: The manual correction of the GPT-generated translation.

correction_tag: A list of tags indicating the corrections made to the translation.

generated_code: This column is initially empty and will contain the code generated from the translated prompt.

generation: Contains the code generated by the three LLMs for each programming language and natural language. Each subdirectory (e.g., java_chinese_claude) contains the following:

files: The files with the generated code (named by the query ID).

report: Reports generated by static analysis tools.

A CSV file (e.g., java_chinese_claude.csv) containing the generated code in the corresponding column.

tests: Contains input files for the testing process and the results of the tests. Files in the input_files directory are formatted according to the CoderEval benchmark requirements. The results directory holds the output of the testing process.

quantitative_analysis: Contains all the csv reports of the static analysis tools and test output for all languages and models. These files are the inputs for the statistical analysis. The directory stats contains all the output tables for the statistical analysis, which are shown in paper's tables.

qualitative_analysis: Contains files used for the qualitative analysis:

CohenKappaagreement.csv: A file containing the subset used to compute Cohen's kappa metrics for manual analysis.

files: Contains all files for the qualitative analysis. Each file has the following columns:

id: The ID of the query in the CoderEval benchmark.

generated_code: The code generated by the model.

comments: The language used for comments.

identifiers: The language used for identifiers.

literals: The language used for literals.

notes: Additional notes.

ablation_study: Contains files for the ablation study. Each file has the following columns:

id: The ID of the query in the CoderEval benchmark.

prompt: The prompt used for code generation.

generated_code, comments, identifiers, and literals: Same as in the qualitative analysis. results.pdf: This file shows the table containing all the percentages of comments, identifiers and literals extracted from the csv files of the ablation study.

Files prefixed with italian contain prompts with signatures and docstrings translated into Italian. The system prompt used is the same as the initial one (see the paper). Files with the english prefix have prompts with the original signature (in English) and the docstring in Italian. The system prompt differs as follows:

You are an AI that only responds with Python code. You will be given a function signature and its docstring by the user. Write your full implementation (restate the function signature). Use a Python code block to write your response. Comments and identifiers must be in Italian. For example: ```python print("Hello World!")

Scripts

The scripts directory contains all the scripts used to perform all the generations and analysis. All files are properly commented. Here a brief description of each file:

code_generation.py: This script automates code generation using AI models (GPT, DeepSeek, and Claude) for different programming and natural languages. It reads prompts from CSV files, generates code based on the prompts, and saves the results in structured directories. It logs the process, handles errors, and stores the generated code in separate files for each iteration.

computeallanalysis.py: This script performs static code analysis on generated code files using different models, languages, and programming languages. It runs various analyses (Flake8, Pylint, Lizard) depending on the programming language: for Python, it runs all three analyses, while for Java, only Lizard is executed. The results are stored in dedicated report directories for each iteration. The script ensures the creation of necessary directories and handles any errors that occur during the analysis process.

createtestjava.py: This script processes Java code generated by different models and languages, extracting methods using a JavaParser server. It iterates through multiple iterations of generated code, extracts the relevant method code (or uses the full code if no method is found), and stores the results in a JSONL file for each language and model combination.

deepseek_model.py: This function sends a request to the DeepSeek API, passing a system and user prompt, and extracts the generated code snippet based on the specified programming language. It prints the extracted code in blue to the console, and if any errors occur during the request or extraction, it prints an error message in red. If successful, it returns the extracted code snippet; otherwise, it returns None.

extractpmdreport.py: This script processes PMD analysis reports in SARIF format and converts them into CSV files. It extracts the contents of ZIP files containing the PMD reports, parses the SARIF file to gather analysis results, and saves the findings in a CSV file. The output includes details such as file names, rules, messages, and the count of issues found. The script iterates through multiple languages, models, and iterations, ensuring that PMD reports are properly processed and saved for each combination.

flake_analysis.py: The flake_analysis function runs Flake8 to analyze Python files for errors and generates a CSV report summarizing the results. It processes the output, extracting error details such as filenames, error codes, and messages. The errors are grouped by file and saved in a CSV file for easy review.

generatepredictionclaude_java.py: The generatecodefrom_prompt function processes a JSON file containing prompts, generates Java code using the Claude API, and saves the generated code to a new JSON file. It validates each prompt, ensures it's JSON-serializable, and sends it to the Claude API for code generation. If the generation is successful, the code is stored in a structured format, and the output is saved to a JSON file for further use.

generatepredictionclaude_python.py: This code defines a function generatecodefrom_prompt that processes a JSON file containing prompts, generates Python code using the Claude API, and saves the generated code to a new JSON file. It handles invalid values and ensures all prompts are
Top programming languages demanded by recruiters worldwide 2025
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Top programming languages demanded by recruiters worldwide 2025 [Dataset]. https://www.statista.com/statistics/1296727/programming-languages-demanded-by-recruiters/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
Worldwide
Description
The most demanded programming languages by recruiters in 2025 were Python, JavaScript, and Java, with around ** percent of recruiters looking to hire people with these programming skills.
g
Development Economics Data Group - Proportion of youth and adults who have...
gimi9.com
Updated Dec 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Development Economics Data Group - Proportion of youth and adults who have wrote a computer program using a specialised programming language, both sexes (%) | gimi9.com [Dataset]. https://gimi9.com/dataset/worldbank_wb_edstats_uis_ictskillproglang/
Explore at:
Dataset updated
Dec 12, 2020
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The proportion of youth and adults with information and communications technology (ICT) skills, by type of skill as defined as the percentage of individuals that have undertaken certain ICT-related activities in the last 3 months. The lack of ICT skills continues to be one of the key barriers keeping people from fully benefitting from the potential of ICT. These data may be used to inform targeted policies to improve ICT skills, and thus contribute to an inclusive information society. The data compiler for this indicator is the International Telecommunication Union (ITU). Eurostat collects data annually for 32 European countries, while the ITU is responsible for setting up the standards and collecting this information from the remaining countries.
Z
Data from: Code4Bench: A Multidimensional Benchmark of Codeforces Data for...
data.niaid.nih.gov
search.datacite.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vahidi-Asl Mojtaba (2020). Code4Bench: A Multidimensional Benchmark of Codeforces Data for Different Program Analysis Techniques [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2582967
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Majd Amirabbas
Zamani Bahman
Baraani-Dastjerdi Ahmad
Khalilian Alireza
Vahidi-Asl Mojtaba
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reproducible research relies on well-designed benchmarks. However, evaluation on a single benchmark increases the risk of overfitting; that is, an optimization to reach a certain performance. In recent years several well-designed benchmarks have been constructed for different subfields of program analysis. However, they often involve real-world industrial projects in few languages such as C or Java. We provide Code4Bench, a benchmark comprising 3,421,357 programs totaling of 306,053,105 lines of code in 41 versions of 28 programming languages such as C/C++, Java, Python, and Kotlin. We have constructed this benchmark from Codeforces, a famous programming competition website, which is widely used by international programmers. Code4Bench advances the state-of-the-art in conducting reproducible and comparative experiments. It helps mitigate the bias and increase the generality and conclusiveness of the results. We present our methodology in construction of Code4Bench and give various descriptive statistics. We have also conducted an online survey on the users of Codeforces’ website whose code is included in the benchmark. The survey is concerned about the user’s demographic information and programming habits, whose results are also provided in the benchmark. Finally, we leveraged an automatic process by which we localized faults within the faulty versions and categorize them according to a coarse-grained classification. In addition to its usage in empirical studies, Code4Bench can be used to teach programming and evolve algorithmic problems. We release Code4Bench in database format to allow researchers to extract other data of the benchmark by arbitrary queries.

Code4Bench version 1.0.0 is publicly available at https://zenodo.org/record/2582968, with DOI 10.5281/zenodo.2582968, thereby providing long-term storage and versioning. It is released under the terms of Creative Commons Attribution 4.0 International license. Code4Bench is also publicly available at: https://github.com/code4bench/Code4Bench, in which we have provided some additional information and script examples.
m
Data from: T3PS v1.0: Tool for Parallel Processing in Parameter Scans
data.mendeley.com
Updated Jan 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinzenz Maurer (2016). T3PS v1.0: Tool for Parallel Processing in Parameter Scans [Dataset]. http://doi.org/10.17632/7cd59f5dhh.1
Explore at:
Unique identifier
https://doi.org/10.17632/7cd59f5dhh.1
Dataset updated
Jan 1, 2016
Authors
Vinzenz Maurer
License
http://www.gnu.org/licenses/gpl-3.0.en.htmlhttp://www.gnu.org/licenses/gpl-3.0.en.html
Description
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2018)

Abstract T3PS is a program that can be used to quickly design and perform parameter scans while easily taking advantage of the multi-core architecture of current processors. It takes an easy to read and write parameter scan definition file format as input. Based on the parameter ranges and other options contained therein, it distributes the calculation of the parameter space over multiple processes and possibly computers. The derived data is saved in a plain text file format readable by most plotting ...

Title of program: T3PS Catalogue Id: AEXZ_v1_0

Nature of problem While current processor architecture firmly goes the way of parallelization even on desktop computers, programs commonly used for parameter scans in physics often lack the capability to take advantage of this. While it is possible to change the source code of some programs, it may not be feasible for every program still in use. Fortunately, current operating system routinely make use of multiple processor cores already, if multiple processes are running at the same time. The easiest way to make ...

Versions of this program held in the CPC repository in Mendeley Data AEXZ_v1_0; T3PS; 10.1016/j.cpc.2015.08.032
D
Replication Data for: Compositio Prompto: An Architecture to Employ Large...
darus.uni-stuttgart.de
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin D. Pesl; Carolin Mombrey; Kevin Klein; Ilche Georgievski; Steffen Becker; Georg Herzwurm; Marco Aiello (2024). Replication Data for: Compositio Prompto: An Architecture to Employ Large Language Models in Automated Service Computing [Dataset]. http://doi.org/10.18419/DARUS-4497
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4497
Dataset updated
Oct 7, 2024
Dataset provided by
DaRUS
Authors
Robin D. Pesl; Carolin Mombrey; Kevin Klein; Ilche Georgievski; Steffen Becker; Georg Herzwurm; Marco Aiello
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4497https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4497
Dataset funded by
BMWK
MWK
Description
A classic, central Service-Oriented Computing (SOC) challenge is the service composition problem. It concerns solving a user-defined task by selecting a suitable set of services, possibly found at runtime, determining an invocation order, and handling request and response parameters. The solutions proposed in the past two decades mostly resort to additional formal modeling of the services, leading to extra effort, scalability issues, and overall brittleness. With the rise of Large Language Models (LLMs), it has become feasible to process semistructured information like state-of-practice OpenAPI documentation containing formal parts like endpoints and free-form elements like descriptions. We propose Compositio Prompto to generate service compositions based on those semi-structured documents. Compositio Prompto acts as an encapsulation of the prompt creation and the model invocation such that the user only has to provide the service specifications, the task, and which input and output format they expect, eliminating any manual and laborious annotation or modeling task by relying on already existing documentation. To validate our approach, we implement a fully operational prototype, which operates on a set of OpenAPIs, a plain text task, and an input and output JSON schema as input and returns the generated service composition as executable Python code. We measure the effectiveness of our approach on a parking spot booking case study. Our experiments show that models can solve several tasks, especially those above 70B parameters, but none can fulfill all tasks. Furthermore, compared with manually created sample solutions, the ones generated by LLMs appear to be close approximations. Methodology (summarized): We perform an automated service composition for parking spot booking using LLMs for the study. There are six parking services and two payment services. The six parking services are duplicated with different distances and prices to create distinct sets 1 and 2. We define eight prompts and perform the composition using 14 different LLMs. We use a best-of-three-shot approach to reduce the influence of randomness. Finally, we assess functionality manually and apply code similarity metrics to a manually crafted sample solution. All experiments are described in detail in the full paper. Content: code/*:Code to perform the experiments. For details, see "code/README.md". code/evaluation/prompt_generation.py:Services and prompt generation. code/evaluation/sample_solution/*:Sample solution and similarity evaluation. results/*:Results for the runs with the LLMs. Filename structure: "results/{model_name}-{run}/prompt_{prompt_number}_set_{set_number}_{artifact}". The artifact can be "code_0.py" for the generated code, "code_1.py" if tasked to improve the code, or "prompt.txt" for the used prompt. For the best run, the code metrics are in "comparison.json". prompt_template.txt:Pseudo code for the prompt template. Implemented in code/evaluation/prompt_generation.py. Note: Please use the tree view to access the files. License: License for the "code/*": MIT. License for the "results/*": CC BY 4.0.
Monthly Stack Overflow Questions
kaggle.com
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ComputingVictor (2024). Monthly Stack Overflow Questions [Dataset]. https://www.kaggle.com/datasets/computingvictor/monthly-trends-in-stack-overflow-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ComputingVictor
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description

This dataset contains information on the popularity and interest in various programming languages over time, as observed through the total number of questions asked on Stack Overflow from 2008 to 2024. The dataset provides insights into the evolving trends and dynamics within the programming landscape, reflecting the changing interests and preferences of developers worldwide.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12042033%2Fddafe2cd1e30bb052310594f3d1d0c93%2FStackOverflowQuestions-ezgif.com-video-to-gif-converter.gif?generation=1711060971699770&alt=media" alt="">

Attributes:

Month: The month & year in which the data was recorded.

Programming Language: The name of the programming language.

Total Questions: The total number of questions asked on Stack Overflow related to the specific programming language during the given year.

Potential Uses of the Dataset:

This dataset enables researchers, analysts, and enthusiasts to explore the historical trajectory of programming language popularity, identify emerging trends, and gain valuable insights into the factors influencing developers' preferences and choices over time.
Proficiency of computer programming in China 2024
statista.com
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Proficiency of computer programming in China 2024 [Dataset]. https://www.statista.com/statistics/1537789/china-proficiency-with-programming-languages-and-coding/
Explore at:
Dataset updated
Nov 29, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
China
Description
Driven by a rapid digitization process and a tech-savvy culture, a considerable number of China's internet population possessed advanced computer skills. A 2024 revealed that about one in every five internet users in the country owned basic coding knowledge using programming languages.
p
Waltham Public Schools Dual Language Program
publicschoolreview.com
json, xml
Updated Jun 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public School Review (2025). Waltham Public Schools Dual Language Program [Dataset]. https://www.publicschoolreview.com/waltham-public-schools-dual-language-program-profile
Explore at:
xml, jsonAvailable download formats
Dataset updated
Jun 22, 2025
Dataset authored and provided by
Public School Review
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2017 - Dec 31, 2025
Area covered
Waltham School District, Waltham
Description
Historical Dataset of Waltham Public Schools Dual Language Program is provided by PublicSchoolReview and contain statistics on metrics:Total Students Trends Over Years (2017-2023),Total Classroom Teachers Trends Over Years (2017-2023),Distribution of Students By Grade Trends,Student-Teacher Ratio Comparison Over Years (2017-2023),Asian Student Percentage Comparison Over Years (2017-2020),Hispanic Student Percentage Comparison Over Years (2017-2023),Black Student Percentage Comparison Over Years (2017-2023),White Student Percentage Comparison Over Years (2017-2023),Two or More Races Student Percentage Comparison Over Years (2017-2023),Diversity Score Comparison Over Years (2017-2023),Reading and Language Arts Proficiency Comparison Over Years (2021-2022),Math Proficiency Comparison Over Years (2021-2023),Overall School Rank Trends Over Years (2021-2023)
F# Data: Making structured data first-class
figshare.com
bin
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomas Petricek (2016). F# Data: Making structured data first-class [Dataset]. http://doi.org/10.6084/m9.figshare.1169941.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1169941.v1
Dataset updated
Jan 19, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Tomas Petricek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Accessing data in structured formats such as XML, CSV and JSON in statically typed languages is difficult, because the languages do not understand the structure of the data. Dynamically typed languages make this syntactically easier, but lead to error-prone code. Despite numerous efforts, most of the data available on the web do not come with a schema. The only information available to developers is a set of examples, such as typical server responses. We describe an inference algorithm that infers a type of structured formats including CSV, XML and JSON. The algorithm is based on finding a common supertype of types representing individual samples (or values in collections). We use the algorithm as a basis for an F# type provider that integrates the inference into the F# type system. As a result, users can access CSV, XML and JSON data in a statically-typed fashion just by specifying a representative sample document.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Most used programming languages among developers worldwide 2024 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/

Most used programming languages among developers worldwide 2024

Explore at:

85 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 6, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

May 19, 2024 - Jun 20, 2024

Area covered

Worldwide

Description

As of 2024, JavaScript and HTML/CSS were the most commonly used programming languages among software developers around the world, with more than 62 percent of respondents stating that they used JavaScript and just around 53 percent using HTML/CSS. Python, SQL, and TypeScript rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

Clear search

Close search

Google apps

Main menu

Most used programming languages among developers worldwide 2024

Programming languages used for software development worldwide 2024

Most popular programming languages worldwide 2024

Programming Language Ecosystem Project TU Wien

About Dataset

About Data collection methodology

github_metadata

PYPL_survey_2004-2023

stack_overflow_developer_survey

Description of the data

License

Acknowledgments

Most Popular Programming Languages Since 2004

Context

Content

Acknowledgements

If this dataset is useful for you then don't forget to upvote.

Programming languages most used in software companies in Russia 2024

Collection of example datasets used for the book - R Programming -...

Programing language & Games

Leading coding languages used in AR and VR worldwide in 2022

libs-github-api: add summary stats

Replication Package of the paper "Large Language Models for Multilingual...

Large Language Models for Multilingual Code Generation: A Benchmark and a Study on Code Quality

Abstract

Replication Package

Data

Scripts

Top programming languages demanded by recruiters worldwide 2025

Development Economics Data Group - Proportion of youth and adults who have...

Data from: Code4Bench: A Multidimensional Benchmark of Codeforces Data for...

Data from: T3PS v1.0: Tool for Parallel Processing in Parameter Scans

Replication Data for: Compositio Prompto: An Architecture to Employ Large...

Monthly Stack Overflow Questions

Description

Attributes:

Potential Uses of the Dataset:

Proficiency of computer programming in China 2024

Waltham Public Schools Dual Language Program

F# Data: Making structured data first-class

Most used programming languages among developers worldwide 2024