82 datasets found
  1. Most used programming languages among developers worldwide 2025

    • statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Most used programming languages among developers worldwide 2025 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 29, 2025 - Jun 23, 2025
    Area covered
    Worldwide
    Description

    As of 2025, JavaScript and HTML/CSS are the most commonly used programming languages among software developers around the world, with more than 66 percent of respondents stating that they used JavaScript and just around 61.9 percent using HTML/CSS. Python, SQL, and Bash/Shell rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  2. E

    Most Popular Programming Languages Statistics

    • enterpriseappstoday.com
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Most Popular Programming Languages Statistics [Dataset]. https://www.enterpriseappstoday.com/stats/programming-languages-statistics.html
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    programming languages statistics: The tech market which is also booming along with digital marketing is pretty good for a better income source. The tech market has many other things including programming languages. Programming languages are the basis for the formation of various websites, games, software, mobile applications, etc... There are nearly 9,000 programming languages around the world with each language with its own feature. In this most popular programming language statistics, we will have a look at statistical information and general knowledge about worldwide available various programming languages. Programming Languages Statistics (Editor’s Choice) There are 8,945 programming languages as stated by most popular Programming languages statistics. As of 2022, JavaScript is one of the most popular programming languages as around 47.86% of recruiters are demanding JavaScript language skills. A basic python developer earns between $70,000 to $1,00,00 a year. As per the most popular programming languages statistics Python has ranked number 1 in the United States of America, India, Germany, France, and the United Kingdom

  3. GitHub Programming Languages Data

    • kaggle.com
    zip
    Updated Jan 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
    Explore at:
    zip(41198 bytes)Available download formats
    Dataset updated
    Jan 2, 2022
    Authors
    Isaac Wen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

    One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

    Content

    This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

    Source

    This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

    Limitations

    Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

  4. Programming languages used for software development worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages used for software development worldwide 2024 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.

  5. Most popular programming languages worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages worldwide 2024 [Dataset]. https://www.statista.com/statistics/1292294/popular-it-skills-worldwide/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Jun 30, 2024
    Area covered
    Worldwide
    Description

    JavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.

  6. t

    Programming Language Ecosystem Project TU Wien

    • test.researchdata.tuwien.at
    csv, text/markdown
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
    Explore at:
    text/markdown, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Time period covered
    Dec 12, 2023
    Area covered
    Vienna
    Description

    About Dataset

    This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.

    The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

    About Data collection methodology

    The dataset was created using the github repository above. As input data, three public datasets where used.

    github_metadata

    Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.

    PYPL_survey_2004-2023

    Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.

    stack_overflow_developer_survey

    Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.

    All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

    Description of the data

    The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

    The languages that are going to be considered for the project can be seen here:

    - Python

    - C

    - C++

    - Java

    - C#

    - JavaScript

    - PHP

    - SQL

    - Assembly

    - Scratch

    - Fortran

    - Go

    - Kotlin

    - Delphi

    - Swift

    - Rust

    - Ruby

    - R

    - COBOL

    - F#

    - Perl

    - TypeScript

    - Haskell

    - Scala

    License

    This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.

    TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

    Acknowledgments

    Thanks go out to

    - stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.

    - the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.

    - Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.

  7. Globally sought-after programming languages among software developers 2022

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Globally sought-after programming languages among software developers 2022 [Dataset]. https://www.statista.com/statistics/793631/worldwide-developer-survey-most-wanted-languages/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 11, 2022 - Jun 1, 2022
    Area covered
    Worldwide
    Description

    According to the survey, Rust was the most desired language in 2022, with over ** percent of respondents that are not developing with it, but expressed interest in developing with it. Python ranked second, followed by TypeScript.

  8. Most Popular Programming Languages 2004-2024

    • kaggle.com
    zip
    Updated Sep 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Roshan Riaz (2024). Most Popular Programming Languages 2004-2024 [Dataset]. https://www.kaggle.com/datasets/muhammadroshaanriaz/most-popular-programming-languages-2004-2024/code
    Explore at:
    zip(3491 bytes)Available download formats
    Dataset updated
    Sep 15, 2024
    Authors
    Muhammad Roshan Riaz
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains the following columns:

    Month: The date (in year-month format) when the data was recorded. Python Worldwide(%): The percentage of global popularity for Python during that month. JavaScript Worldwide(%): The percentage of global popularity for JavaScript. Java Worldwide(%): The percentage of global popularity for Java. C# Worldwide(%): The percentage of global popularity for C#. PhP Worldwide(%): The percentage of global popularity for PhP. Flutter Worldwide(%): The percentage of global popularity for Flutter. React Worldwide(%): The percentage of global popularity for React. Swift Worldwide(%): The percentage of global popularity for Swift. TypeScript Worldwide(%): The percentage of global popularity for TypeScript. Matlab Worldwide(%): The percentage of global popularity for Matlab.

    Each row represents data for a particular month, starting from January 2004, tracking the popularity trends of these programming languages worldwide.

  9. Most popular programming languages in Poland 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages in Poland 2024 [Dataset]. https://www.statista.com/statistics/1184564/poland-most-popular-software-languages/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Poland
    Description

    In the fourth quarter 2024, the most popular programming languages in published job offers in Poland were ***********, and Java.

  10. Top programming languages demanded by recruiters worldwide 2025

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top programming languages demanded by recruiters worldwide 2025 [Dataset]. https://www.statista.com/statistics/1296727/programming-languages-demanded-by-recruiters/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most demanded programming languages by recruiters in 2025 were Python, JavaScript, and Java, with around ** percent of recruiters looking to hire people with these programming skills.

  11. github-final-datasets

    • kaggle.com
    zip
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olga Ivanova (2023). github-final-datasets [Dataset]. https://www.kaggle.com/datasets/olgaiv39/github-final-datasets
    Explore at:
    zip(1877861953 bytes)Available download formats
    Dataset updated
    Nov 9, 2023
    Authors
    Olga Ivanova
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Github Clean Code Snippets Dataset

    Here is a description, how the datasets for a training notebook used for Telegram ML Contest solution were prepared.

    1 Step - Github Samples Database parsing

    The first part of the code samples was taken from a private version of this notebook.

    Here is the statistics about classes of programming languages from Github Code Snippets database https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F2fdc091661198e80559f8cb1d1a306ff%2FScreenshot%202023-11-07%20at%2021.24.42.png?generation=1699390166413391&alt=media" alt="">

    From this database, 2 csv files were created - with 50000 code samples for each of the 20 programming languages included, with equal by numbers and stratified sampling. The files related here are sample_equal_prop_50000.csv and sample_equal_prop_50000.csv and sample_stratified_50000.csv, respectively.

    2 Step - Github Bigquery Database parsing

    Second option for capturing out additional examples was to run this notebook with making up larger amount of queries, 10000.

    The resulted file is dataset-10000.csv - included to the data card

    The statistics for the code programming languages is as on the next chart - it has 32 labeled classes
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F7c04342da8ec1df266cd90daf00204f9%2FScreenshot%202023-10-13%20at%2020.52.13.png?generation=1699392769199533&alt=media" alt="">

    3 Step - collection of code samples of raw coding samples

    To get a model more robust, code samples of 20 additional languages were collected in amount from 10 till 15 samples on more-less popular use cases. Also, for the class "OTHER", like regular language examples, according to the task of the competition, the text examples from this dataset with promts on Huggingface were added to the file. The resulted file here is rare_languages.csv - also in data card

    The statistics for rare languages code snippets is as follows: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F0b340781c774d2acb988ce1567f4afa3%2FScreenshot%202023-11-08%20at%2001.13.07.png?generation=1699402436798661&alt=media" alt="">

    4 Step - First and second datasets combining

    For this stage of dataset creation, the number of the columns in sample_equal_prop_50000.csv and sample_stratified_50000.csv was cut out just for 2 - "snippet", "language", the version of file with equal numbers is in the data card - sample_equal_prop_50000_clean.csv

    To prepare Bigquery dataset file, the column with index was cut out, and the column "content" was renamed to "snippet". These changes were saved in dataset-10000-clean.csv

    After that, the files sample_equal_prop_50000_clean.csv and dataset-10000-clean.csv were combined together and saved as github-combined-file.csv

    5 Step - Datasets cleaning from symbols and merging together with rare languages

    The prepared files took too much RAM to be read by Pandas library, so that is why additional prepocessing has been made - the symbols like quatas, commas, ampersands, new lines and adding tabs characters were cleaned out. After clieaning, the flies were merged with rare_languages.csv file and saved as github-combined-file-no-symbols-rare-clean.csv and sample_equal_prop_50000_-no-symbols-rare-clean.csv, respectively.

    The final distribution of classes turned out to be the next one https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2Ff43e0cea4c565c9f7c808527b0dfa2da%2FScreenshot%202023-11-09%20at%2020.26.30.png?generation=1699558064765454&alt=media" alt="">

    6 Step - Fixing up the labels

    To be suitable for TF-DF format, to each programming language a certain label was given as well. The final labels are in the data card.

  12. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  13. r

    Data from: Working with a linguistic corpus using R: An introductory note...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia (2022). Working with a linguistic corpus using R: An introductory note with Indonesian Negating Construction [Dataset]. http://doi.org/10.4225/03/5a7ee2ac84303
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).


    To cite the paper (in APA 6th style):

    Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71


    To cite this repository:
    Click on the Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)

    This repository consists of the following files:
    1. Source R Markdown Notebook (.Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.
    2. Tutorial to download the Leipzig Corpus file used in the paper. It is freely available on the Leipzig Corpora Collection Download page.
    3. Accompanying datasets as images and .rds format so that all code-chunks in the R Markdown file can be run.
    4. BibLaTeX and .csl files for the referencing and bibliography (with APA 6th style).
    5. A snippet of the R session info after running all codes in the R Markdown file.
    6. RStudio project file (.Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.
    7. A .docx template file following the basic stylesheet for Linguistik Indonesia

    Put all these files in the same folder (including the downloaded Leipzig corpus file)!

    To render the R Markdown into MS Word document, we use the bookdown R package (Xie, 2018). Make sure this package is installed in R.

    Yihui Xie (2018). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.6.


  14. Programming languages with the highest salaries 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages with the highest salaries 2024 [Dataset]. https://www.statista.com/statistics/1127190/programming-languages-associated-highest-salaries-worldwide/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 19, 2024 - Jun 20, 2024
    Area covered
    Worldwide
    Description

    According to the survey, Erlang and Elixir are the programming languages that are associated with the highest salaries worldwide in 2024, with an average of around *** and ** thousand U.S. dollars respectively.

  15. I

    Global Programming Language Learning Market Risk Analysis 2025-2032

    • statsndata.org
    excel, pdf
    Updated Oct 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats N Data (2025). Global Programming Language Learning Market Risk Analysis 2025-2032 [Dataset]. https://www.statsndata.org/report/programming-language-learning-market-94881
    Explore at:
    pdf, excelAvailable download formats
    Dataset updated
    Oct 2025
    Dataset authored and provided by
    Stats N Data
    License

    https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order

    Area covered
    Global
    Description

    The Programming Language Learning market has emerged as a vital sector in the rapidly evolving digital landscape, where coding skills are increasingly recognized as essential for success across various industries. As companies from tech giants to small startups seek individuals who can navigate and manipulate comple

  16. Stack Overflow Questions 2020-2025

    • kaggle.com
    zip
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kutay Şahin (2025). Stack Overflow Questions 2020-2025 [Dataset]. https://www.kaggle.com/datasets/kutayahin/stackoverflow-programming-questions-2020-2025
    Explore at:
    zip(32424810 bytes)Available download formats
    Dataset updated
    Nov 15, 2025
    Authors
    Kutay Şahin
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Stack Overflow Programming Questions Dataset (2020-2025)

    Overview

    This comprehensive dataset contains 95,636 programming questions from Stack Overflow, covering 20 popular programming languages collected over a 5-year period (2020-2025). Each question includes detailed metadata, top answers, and quality metrics.

    Dataset Statistics

    • Total Questions: 95,636
    • Programming Languages: 20
    • Time Period: 2020-2025
    • Features: 34 columns
    • Dataset Size: ~130 MB
    • Answer Rate: 54.79%
    • Code Presence: 92.62%
    • Uniqueness: 99.99%

    Programming Languages Included

    1. Python (6,491 questions)
    2. JavaScript (7,355 questions)
    3. Java (5,948 questions)
    4. C++ (5,272 questions)
    5. C# (5,167 questions)
    6. Swift (5,044 questions)
    7. R (5,014 questions)
    8. C (4,869 questions)
    9. Rust (4,847 questions)
    10. Ruby (4,846 questions)
    11. TypeScript (4,143 questions)
    12. Scala (4,526 questions)
    13. Kotlin (4,543 questions)
    14. Go (4,810 questions)
    15. PHP (4,780 questions)
    16. MATLAB (4,157 questions)
    17. Perl (3,854 questions)
    18. HTML (2,891 questions)
    19. CSS (1,762 questions)
    20. SQL (4,687 questions)

    Features

    Question Information

    • question_id: Unique Stack Overflow question ID
    • title: Question title
    • body: Full question body (HTML formatted)
    • tags: Comma-separated tags
    • programming_language: Primary programming language

    Metrics

    • view_count: Number of views
    • score: Question score (upvotes - downvotes)
    • answer_count: Number of answers
    • is_answered: Whether question has accepted answer
    • has_accepted_answer: Whether question has accepted answer

    Content Analysis

    • has_code: Whether question contains code blocks
    • code_block_count: Number of code blocks
    • body_word_count: Word count in question body
    • body_char_count: Character count in question body
    • title_word_count: Word count in title

    Quality Metrics

    • difficulty_score: Calculated difficulty score (0-1)
    • quality_score: Calculated quality score (0-1)
    • owner_reputation: Question owner's reputation

    Temporal Features

    • creation_date: Question creation timestamp
    • creation_year: Year of creation
    • creation_month: Month of creation
    • creation_weekday: Day of week (0=Monday)
    • last_activity_date: Last activity timestamp
    • first_response_time_seconds: Time to first answer (seconds)

    Answer Information

    • top_answer_score: Score of top answer
    • top_answer_body_length: Length of top answer body
    • accepted_answer_score: Score of accepted answer

    Data Collection Methodology

    • Source: Stack Exchange API (official API)
    • Collection Period: November 2020 - November 2025
    • Filters Applied:
      • Minimum 100 views
      • Minimum 1 answer
      • Questions with body content
    • Answer Collection: Top 3 answers per question
    • Data Cleaning: Duplicate removal, HTML cleaning, validation

    Use Cases

    1. Natural Language Processing (NLP)

      • Question classification
      • Sentiment analysis
      • Topic modeling
      • Text generation
    2. Machine Learning

      • Question quality prediction
      • Answer recommendation systems
      • Duplicate question detection
      • Difficulty estimation
    3. Data Science Research

      • Programming language trends
      • Developer behavior analysis
      • Community engagement patterns
      • Technical knowledge evolution
    4. Educational Applications

      • Learning resource generation
      • Difficulty assessment
      • Curriculum development
      • Student assessment tools
    5. Software Engineering

      • Code pattern analysis
      • Best practices extraction
      • Documentation generation
      • Technical support automation

    Data Quality

    • Completeness: 97.47% (excellent)
    • Uniqueness: 99.99% (excellent)
    • Answer Coverage: 54.79% (good)
    • Code Presence: 92.62% (excellent)
    • Overall Quality Score: 53.65/100

    License

    This dataset is licensed under CC-BY-SA-4.0 (Creative Commons Attribution-ShareAlike 4.0 International), matching Stack Overflow's content license.

    Citation

    If you use this dataset in your research, please cite:

    @dataset{stackoverflow_programming_questions_2025,
     title = {Stack Overflow Programming Questions Dataset (2020-2025)},
     author = {kutayahin},
     year = {2025},
     url = {https://www.kaggle.com/datasets/kutayahin/stackoverflow-programming-questions-2020-2025},
     license = {CC-BY-SA-4.0}
    }
    

    Acknowledgments

    • Data collected from Stack Overflow via Stack Exchange API
    • Stack Overflow community for providing valuable Q&A content
    • Stack Exchange for providing public API access

    Updates

    • Version 1.0 (2025-11-15): Initial release with 95,636 questions from 20 programming languages

    Contact

    For questions, suggestions, or issues, please open an issue on the dataset page or contact the dataset maintainer.

    Related Datasets

    • Stack Over...
  17. Average wages in IT in Poland 2024, by programming language

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Average wages in IT in Poland 2024, by programming language [Dataset]. https://www.statista.com/statistics/1184617/poland-average-maximum-wages-in-it-by-programming-language/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Poland
    Description

    In the fourth quarter of 2024, the best pay conditions were offered to Ruby contracted programmers. Scala and Kotlin came next, with salaries exceeding ****** zloty per month. The best-paid specialist with a B2B contract worked in Kotlin and Java.

  18. f

    Data_Sheet_2_A Primer on R for Numerical Analysis in Educational...

    • frontiersin.figshare.com
    txt
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tricia R. Prokop; Michael Wininger (2023). Data_Sheet_2_A Primer on R for Numerical Analysis in Educational Research.CSV [Dataset]. http://doi.org/10.3389/feduc.2018.00080.s002
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Tricia R. Prokop; Michael Wininger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Researchers engaged in the scholarship of teaching and learning seek tools for rigorous, quantitative analysis. Here we present a brief introduction to computational techniques for the researcher with interest in analyzing data pertaining to pedagogical study. Sample dataset and fully executable code in the open-source R programming language are provided, along with illustrative vignettes relevant to common forms of inquiry in the educational setting.

  19. Programming language community sizes worldwide 2023

    • statista.com
    Updated Nov 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Programming language community sizes worldwide 2023 [Dataset]. https://www.statista.com/statistics/1241923/worldwide-software-developer-programming-language-communities/
    Explore at:
    Dataset updated
    Nov 16, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    According to the survey, the size of the JavaScript programming language community is roughly **** percent of software developers as of 2023, making it the most popular programming language in the world. Python is also a popular community for programmers, with **** percent of developers.

  20. 4

    Source code in the R programming language, belonging with: Model based...

    • data.4tu.nl
    • datasetcatalog.nlm.nih.gov
    zip
    Updated Oct 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    L. (Luc) Steinbuch; T.G. (Thomas) Orton; D.J. (Dick) Brus (2019). Source code in the R programming language, belonging with: Model based geostatistics from a Bayesian perspective: Investigating area‐to‐point kriging with small datasets [Dataset]. http://doi.org/10.4121/uuid:1fe0c01e-7f67-435b-a240-800579adc6e6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 28, 2019
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    L. (Luc) Steinbuch; T.G. (Thomas) Orton; D.J. (Dick) Brus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Area-to-point kriging (ATPK) is a geostatistical method for creating maps of high resolution using data of much lower resolution. These R-scripts compare prediction uncertainty using different ATPK methods, using simulations and a real world case concerning crop yields in Burkina Faso.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista, Most used programming languages among developers worldwide 2025 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
Organization logo

Most used programming languages among developers worldwide 2025

Explore at:
99 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 29, 2025 - Jun 23, 2025
Area covered
Worldwide
Description

As of 2025, JavaScript and HTML/CSS are the most commonly used programming languages among software developers around the world, with more than 66 percent of respondents stating that they used JavaScript and just around 61.9 percent using HTML/CSS. Python, SQL, and Bash/Shell rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

Search
Clear search
Close search
Google apps
Main menu