100+ datasets found
  1. GitHub Programming Languages Data

    • kaggle.com
    zip
    Updated Jan 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
    Explore at:
    zip(41198 bytes)Available download formats
    Dataset updated
    Jan 2, 2022
    Authors
    Isaac Wen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

    One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

    Content

    This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

    Source

    This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

    Limitations

    Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

  2. Most used programming languages among developers worldwide 2025

    • statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Most used programming languages among developers worldwide 2025 [Dataset]. https://www.statista.com/statistics/793628/worldwide-developer-survey-most-used-languages/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 29, 2025 - Jun 23, 2025
    Area covered
    Worldwide
    Description

    As of 2025, JavaScript and HTML/CSS are the most commonly used programming languages among software developers around the world, with more than 66 percent of respondents stating that they used JavaScript and just around 61.9 percent using HTML/CSS. Python, SQL, and Bash/Shell rounded out the top five most widely used programming languages around the world. Programming languages At a very basic level, programming languages serve as sets of instructions that direct computers on how to behave and carry out tasks. Thanks to the increased prevalence of, and reliance on, computers and electronic devices in today’s society, these languages play a crucial role in the everyday lives of people around the world. An increasing number of people are interested in furthering their understanding of these tools through courses and bootcamps, while current developers are constantly seeking new languages and resources to learn to add to their skills. Furthermore, programming knowledge is becoming an important skill to possess within various industries throughout the business world. Job seekers with skills in Python, R, and SQL will find their knowledge to be among the most highly desirable data science skills and likely assist in their search for employment.

  3. E

    Most Popular Programming Languages Statistics

    • enterpriseappstoday.com
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EnterpriseAppsToday (2023). Most Popular Programming Languages Statistics [Dataset]. https://www.enterpriseappstoday.com/stats/programming-languages-statistics.html
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    EnterpriseAppsToday
    License

    https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    programming languages statistics: The tech market which is also booming along with digital marketing is pretty good for a better income source. The tech market has many other things including programming languages. Programming languages are the basis for the formation of various websites, games, software, mobile applications, etc... There are nearly 9,000 programming languages around the world with each language with its own feature. In this most popular programming language statistics, we will have a look at statistical information and general knowledge about worldwide available various programming languages. Programming Languages Statistics (Editor’s Choice) There are 8,945 programming languages as stated by most popular Programming languages statistics. As of 2022, JavaScript is one of the most popular programming languages as around 47.86% of recruiters are demanding JavaScript language skills. A basic python developer earns between $70,000 to $1,00,00 a year. As per the most popular programming languages statistics Python has ranked number 1 in the United States of America, India, Germany, France, and the United Kingdom

  4. Programming languages used for software development worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Programming languages used for software development worldwide 2024 [Dataset]. https://www.statista.com/statistics/869092/worldwide-software-developer-survey-languages-used/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most popular programming language used in the past 12 months by software developers worldwide is JavaScript as of 2024, according to ** percent of the software developers surveyed. This is followed by Python at ** percent of the respondents surveyed.

  5. Most popular programming languages worldwide 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages worldwide 2024 [Dataset]. https://www.statista.com/statistics/1292294/popular-it-skills-worldwide/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Jun 30, 2024
    Area covered
    Worldwide
    Description

    JavaScript and Java were some of the most tested programming languages on the DevSkiller platform as of 2024. SQL and Python ranked second and fourth, with ** percent and ** percent of respondents testing this language in 2024, respectively. Nevertheless, the tech skill developers wanted to learn the most in 2024 was related to artificial intelligence, machine learning, and deep learning. At the same time, the fastest growing IT skills among DevSkiller customers were C/C++ and data science, while cybersecurity ranked third. Software skills When it came to the most used programming language among developers worldwide, JavaScript took the top spot, chosen by 62 percent of surveyed respondents. Most software developers learn how to code between 11 and 17 years old, with some of them writing their first line of code by the age of 5. Moreover, seven out of 10 developers learned how to program by accessing online resources such as videos and blogs. Software skills pay In 2024, the average annual software developer’s salary in the U.S. amounted to nearly ** thousand U.S. dollars, while in Germany, it totaled above ** thousand U.S. dollars. The programming languages associated with the highest salaries worldwide in 2024 were Clojure and Erlang.

  6. Programming Languages

    • kaggle.com
    zip
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). Programming Languages [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/programming-languages
    Explore at:
    zip(879324 bytes)Available download formats
    Dataset updated
    Sep 16, 2023
    Authors
    Sujay Kapadnis
    Description

    The Dataset comes from Programming Languages Database

    languages.csv

    The full data dictionary is available from PLDB.com.

    variableclassdescription
    pldb_idcharacterA standardized, uniquified version of the language name, used as an ID on the PLDB site.
    titlecharacterThe official title of the language.
    descriptioncharacterDescription of the repo on GitHub.
    typecharacterWhich category in PLDB's subjective ontology does this entity fit into.
    appeareddoubleWhat year was the language publicly released and/or announced?
    creatorscharacterName(s) of the original creators of the language delimited by " and "
    websitecharacterURL of the official homepage for the language project.
    domain_namecharacterIf the project website is on its own domain.
    domain_name_registereddoubleWhen was this domain first registered?
    referencecharacterA link to more info about this entity.
    isbndbdoubleBooks about this language from ISBNdb.
    book_countdoubleComputed; the number of books found for this language at isbndb.com
    semantic_scholarintegerPapers about this language from Semantic Scholar.
    language_rankdoubleComputed; A rank for the language, taking into account various online rankings. The computation for this column is not currently clear.
    github_repocharacterURL of the official GitHub repo for the project if it hosted there.
    github_repo_starsdoubleHow many stars of the repo?
    github_repo_forksdoubleHow many forks of the repo?
    github_repo_updateddoubleWhat year was the last commit made?
    github_repo_subscribersdoubleHow many subscribers to the repo?
    github_repo_createddoubleWhen was the Github repo for this entity created?
    github_repo_descriptioncharacterDescription of the repo on GitHub.
    github_repo_issuesdoubleHow many isses on the repo?
    github_repo_first_commitdoubleWhat year the first commit made in this git repo?
    github_languagecharacterGitHub has a set of supported languages as defined here
    github_language_tm_scopecharacterThe TextMate scope that represents this programming language.
    github_language_typecharacterEither data, programming, markup, prose, or nil.
    github_language_ace_modecharacterA String name of the Ace Mode used for highlighting whenever a file is edited. This must match one of the filenames in http://git.io/3XO_Cg. Use "text" if a mode does not exist.
    github_language_file_extensionscharacterAn Array of associated extensions (the first one is considered the primary extension, the others should be listed alphabetically).
    github_language_reposdoubleHow many repos for this language does GitHub report?
    wikipediacharacterURL of the entity on Wikipedia, if and only if it has a page dedicated to it.
    wikipedia_daily_page_viewsdoubleHow many page views per day does this Wikipedia page get? Useful as a signal for rankings. Available via WP api.
    wikipedia_backlinks_countdoubleHow many pages on WP link to this page?
    wikipedia_summarycharacterWhat is the text summary of the language from the Wikipedia page?
    wikipedia_page_iddoubleWaht is the internal ID for this entity on WP?
    wikipedia_appeareddoubleWhen does Wikipedia claim this entity first appeared?
    wikipedia_createddoubleWhen was the Wikipedia page for this entity created?
    wikipedia_revision_countdoubleHow many revisions does this page have?
    wikipedia_relatedcharacterWhat languages does Wikipedia have as related?
    features_has_commentslogicalDoes this language have a comment character?
    features_has_semantic_indentationlogicalDoes indentation have semantic meaning in this language?
    features_has_line_commentslogicalDoes this language support inline comments (as opposed to comments that must span an entire line)?
    line_comment_tokencharacter...
  7. t

    Programming Language Ecosystem Project TU Wien

    • test.researchdata.tuwien.at
    csv, text/markdown
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer (2024). Programming Language Ecosystem Project TU Wien [Dataset]. http://doi.org/10.70124/gnbse-ts649
    Explore at:
    text/markdown, csvAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    TU Wien
    Authors
    Valentin Futterer; Valentin Futterer; Valentin Futterer; Valentin Futterer
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Time period covered
    Dec 12, 2023
    Area covered
    Vienna
    Description

    About Dataset

    This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.

    The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.

    About Data collection methodology

    The dataset was created using the github repository above. As input data, three public datasets where used.

    github_metadata

    Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.

    PYPL_survey_2004-2023

    Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.

    stack_overflow_developer_survey

    Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.

    All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above

    Description of the data

    The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.

    The languages that are going to be considered for the project can be seen here:

    - Python

    - C

    - C++

    - Java

    - C#

    - JavaScript

    - PHP

    - SQL

    - Assembly

    - Scratch

    - Fortran

    - Go

    - Kotlin

    - Delphi

    - Swift

    - Rust

    - Ruby

    - R

    - COBOL

    - F#

    - Perl

    - TypeScript

    - Haskell

    - Scala

    License

    This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.

    TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.

    Acknowledgments

    Thanks go out to

    - stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.

    - the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.

    - Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.

  8. Collection of example datasets used for the book - R Programming -...

    • figshare.com
    txt
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kingsley Okoye; Samira Hosseini (2023). Collection of example datasets used for the book - R Programming - Statistical Data Analysis in Research [Dataset]. http://doi.org/10.6084/m9.figshare.24728073.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 4, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kingsley Okoye; Samira Hosseini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This book is written for statisticians, data analysts, programmers, researchers, teachers, students, professionals, and general consumers on how to perform different types of statistical data analysis for research purposes using the R programming language. R is an open-source software and object-oriented programming language with a development environment (IDE) called RStudio for computing statistics and graphical displays through data manipulation, modelling, and calculation. R packages and supported libraries provides a wide range of functions for programming and analyzing of data. Unlike many of the existing statistical softwares, R has the added benefit of allowing the users to write more efficient codes by using command-line scripting and vectors. It has several built-in functions and libraries that are extensible and allows the users to define their own (customized) functions on how they expect the program to behave while handling the data, which can also be stored in the simple object system.For all intents and purposes, this book serves as both textbook and manual for R statistics particularly in academic research, data analytics, and computer programming targeted to help inform and guide the work of the R users or statisticians. It provides information about different types of statistical data analysis and methods, and the best scenarios for use of each case in R. It gives a hands-on step-by-step practical guide on how to identify and conduct the different parametric and non-parametric procedures. This includes a description of the different conditions or assumptions that are necessary for performing the various statistical methods or tests, and how to understand the results of the methods. The book also covers the different data formats and sources, and how to test for reliability and validity of the available datasets. Different research experiments, case scenarios and examples are explained in this book. It is the first book to provide a comprehensive description and step-by-step practical hands-on guide to carrying out the different types of statistical analysis in R particularly for research purposes with examples. Ranging from how to import and store datasets in R as Objects, how to code and call the methods or functions for manipulating the datasets or objects, factorization, and vectorization, to better reasoning, interpretation, and storage of the results for future use, and graphical visualizations and representations. Thus, congruence of Statistics and Computer programming for Research.

  9. Programming Language Database

    • kaggle.com
    zip
    Updated Mar 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujay Kapadnis (2023). Programming Language Database [Dataset]. https://www.kaggle.com/datasets/sujaykapadnis/programming-language-database/versions/1
    Explore at:
    zip(1195915 bytes)Available download formats
    Dataset updated
    Mar 6, 2023
    Authors
    Sujay Kapadnis
    Description

    The dataset contains information on over 4000 programming languages. Which include facts about the language such as what year it was created, What is its rank, and other parameters that you will come to know once you explore the dataset.

    Credits. https://github.com/breck7/pldb

  10. Programming Language Data Set

    • kaggle.com
    zip
    Updated Dec 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Divyanshu (2022). Programming Language Data Set [Dataset]. https://www.kaggle.com/datasets/divyanshukunwar/programming-language-data-set
    Explore at:
    zip(9660 bytes)Available download formats
    Dataset updated
    Dec 2, 2022
    Authors
    Divyanshu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a chronological ordering / timeline of Programming language . What one can do with this dataset ? -> Find relation between different programming language using Predecessors column. -> Most frequent chief developer / company etc.

  11. Top data science skills in U.S. 2019

    • statista.com
    Updated Jun 13, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2019). Top data science skills in U.S. 2019 [Dataset]. https://www.statista.com/statistics/1016247/united-states-wanted-data-science-skills/
    Explore at:
    Dataset updated
    Jun 13, 2019
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 2019
    Area covered
    United States
    Description

    The statistic displays the most wanted data science skills in the United States as of **********. As of the measured period, ***** percent of data scientist job openings on LinkedIn required a knowledge of the programming language Python.

  12. Programming-languages

    • kaggle.com
    zip
    Updated Nov 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alepuzio (2020). Programming-languages [Dataset]. https://www.kaggle.com/alepuzio/programminglanguages
    Explore at:
    zip(4304 bytes)Available download formats
    Dataset updated
    Nov 25, 2020
    Authors
    alepuzio
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    This project was born reading Wikipedia and Kaggle .

    Acknowledgements

    Thank you to Wikipedia beacuae of his work.

    Inspiration

    • What are the most felixble languaes?
    • For aone use case, what''s the best language and paradigm?
  13. Hello World In Programming Languages

    • kaggle.com
    zip
    Updated Sep 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav Isaienkov (2020). Hello World In Programming Languages [Dataset]. https://www.kaggle.com/datasets/ihelon/hello-world-in-programming-languages/data
    Explore at:
    zip(55074 bytes)Available download formats
    Dataset updated
    Sep 16, 2020
    Authors
    Yaroslav Isaienkov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Content

    The dataset contains "Hello world" programs for different programming languages. Each row in the main file describes the one program: language, file extension, and the program's text itself.

    For example, the language Asciidots has the .arnoldc files extension and the Hello World program looks like: .-$"Hello World"

    Acknowledgements

    This dataset scrapping from the next sources: - https://github.com/leachim6/hello-world - https://helloworldcollection.github.io/

    Inspiration

    You can try to resolve the next tasks: - Generate features for some languages - Clustering languages by their code or some features

  14. Globally sought-after programming languages among software developers 2022

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Globally sought-after programming languages among software developers 2022 [Dataset]. https://www.statista.com/statistics/793631/worldwide-developer-survey-most-wanted-languages/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 11, 2022 - Jun 1, 2022
    Area covered
    Worldwide
    Description

    According to the survey, Rust was the most desired language in 2022, with over ** percent of respondents that are not developing with it, but expressed interest in developing with it. Python ranked second, followed by TypeScript.

  15. github-final-datasets

    • kaggle.com
    zip
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olga Ivanova (2023). github-final-datasets [Dataset]. https://www.kaggle.com/datasets/olgaiv39/github-final-datasets
    Explore at:
    zip(1877861953 bytes)Available download formats
    Dataset updated
    Nov 9, 2023
    Authors
    Olga Ivanova
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Github Clean Code Snippets Dataset

    Here is a description, how the datasets for a training notebook used for Telegram ML Contest solution were prepared.

    1 Step - Github Samples Database parsing

    The first part of the code samples was taken from a private version of this notebook.

    Here is the statistics about classes of programming languages from Github Code Snippets database https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F2fdc091661198e80559f8cb1d1a306ff%2FScreenshot%202023-11-07%20at%2021.24.42.png?generation=1699390166413391&alt=media" alt="">

    From this database, 2 csv files were created - with 50000 code samples for each of the 20 programming languages included, with equal by numbers and stratified sampling. The files related here are sample_equal_prop_50000.csv and sample_equal_prop_50000.csv and sample_stratified_50000.csv, respectively.

    2 Step - Github Bigquery Database parsing

    Second option for capturing out additional examples was to run this notebook with making up larger amount of queries, 10000.

    The resulted file is dataset-10000.csv - included to the data card

    The statistics for the code programming languages is as on the next chart - it has 32 labeled classes
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F7c04342da8ec1df266cd90daf00204f9%2FScreenshot%202023-10-13%20at%2020.52.13.png?generation=1699392769199533&alt=media" alt="">

    3 Step - collection of code samples of raw coding samples

    To get a model more robust, code samples of 20 additional languages were collected in amount from 10 till 15 samples on more-less popular use cases. Also, for the class "OTHER", like regular language examples, according to the task of the competition, the text examples from this dataset with promts on Huggingface were added to the file. The resulted file here is rare_languages.csv - also in data card

    The statistics for rare languages code snippets is as follows: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2F0b340781c774d2acb988ce1567f4afa3%2FScreenshot%202023-11-08%20at%2001.13.07.png?generation=1699402436798661&alt=media" alt="">

    4 Step - First and second datasets combining

    For this stage of dataset creation, the number of the columns in sample_equal_prop_50000.csv and sample_stratified_50000.csv was cut out just for 2 - "snippet", "language", the version of file with equal numbers is in the data card - sample_equal_prop_50000_clean.csv

    To prepare Bigquery dataset file, the column with index was cut out, and the column "content" was renamed to "snippet". These changes were saved in dataset-10000-clean.csv

    After that, the files sample_equal_prop_50000_clean.csv and dataset-10000-clean.csv were combined together and saved as github-combined-file.csv

    5 Step - Datasets cleaning from symbols and merging together with rare languages

    The prepared files took too much RAM to be read by Pandas library, so that is why additional prepocessing has been made - the symbols like quatas, commas, ampersands, new lines and adding tabs characters were cleaned out. After clieaning, the flies were merged with rare_languages.csv file and saved as github-combined-file-no-symbols-rare-clean.csv and sample_equal_prop_50000_-no-symbols-rare-clean.csv, respectively.

    The final distribution of classes turned out to be the next one https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F833757%2Ff43e0cea4c565c9f7c808527b0dfa2da%2FScreenshot%202023-11-09%20at%2020.26.30.png?generation=1699558064765454&alt=media" alt="">

    6 Step - Fixing up the labels

    To be suitable for TF-DF format, to each programming language a certain label was given as well. The final labels are in the data card.

  16. P

    Poland Individuals: Writing Code in a Programming Language: 25-34

    • ceicdata.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). Poland Individuals: Writing Code in a Programming Language: 25-34 [Dataset]. https://www.ceicdata.com/en/poland/individuals-carrying-out-software-related-activities-by-age/individuals-writing-code-in-a-programming-language-2534
    Explore at:
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2015 - Dec 1, 2024
    Area covered
    Poland
    Description

    Poland Individuals: Writing Code in a Programming Language: 25-34 data was reported at 10.100 % in 2024. This records an increase from the previous number of 8.700 % for 2023. Poland Individuals: Writing Code in a Programming Language: 25-34 data is updated yearly, averaging 5.600 % from Dec 2015 (Median) to 2024, with 7 observations. The data reached an all-time high of 10.100 % in 2024 and a record low of 3.600 % in 2015. Poland Individuals: Writing Code in a Programming Language: 25-34 data remains active status in CEIC and is reported by Statistics Poland. The data is categorized under Global Database’s Poland – Table PL.G040: Individuals Carrying Out Software Related Activities: by Age.

  17. Data Software Preference Amongst Kaggle Users

    • kaggle.com
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Venessa Green (2023). Data Software Preference Amongst Kaggle Users [Dataset]. https://www.kaggle.com/datasets/venessagreen/data-software-preference-amongst-kaggle-users
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Venessa Green
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This project analyzes the results of a survey conducted on Kaggle users to understand the tools and programming languages they prefer for their work in data science. I explored the popularity of tools like Jupyter notebooks, Excel, and Tableau, as well as programming languages like R, Python, and SQL. I also examine how these preferences vary by factors such as job title, years of experience, and company size. This analysis provides insights into the most widely-used tools and languages in the data science community, and can help guide individuals and organizations in making informed decisions about their software and language choices.

  18. Most popular programming languages in Poland 2024

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most popular programming languages in Poland 2024 [Dataset]. https://www.statista.com/statistics/1184564/poland-most-popular-software-languages/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Poland
    Description

    In the fourth quarter 2024, the most popular programming languages in published job offers in Poland were ***********, and Java.

  19. r

    Data from: Working with a linguistic corpus using R: An introductory note...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia (2022). Working with a linguistic corpus using R: An introductory note with Indonesian Negating Construction [Dataset]. http://doi.org/10.4225/03/5a7ee2ac84303
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg; I Made Rajeg; Karlina Denistia
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a repository for codes and datasets for the open-access paper in Linguistik Indonesia, the flagship journal for the Linguistic Society of Indonesia (Masyarakat Linguistik Indonesia [MLI]) (cf. the link in the references below).


    To cite the paper (in APA 6th style):

    Rajeg, G. P. W., Denistia, K., & Rajeg, I. M. (2018). Working with a linguistic corpus using R: An introductory note with Indonesian negating construction. Linguistik Indonesia, 36(1), 1–36. doi: 10.26499/li.v36i1.71


    To cite this repository:
    Click on the Cite (dark-pink button on the top-left) and select the citation style through the dropdown button (default style is Datacite option (right-hand side)

    This repository consists of the following files:
    1. Source R Markdown Notebook (.Rmd file) used to write the paper and containing the R codes to generate the analyses in the paper.
    2. Tutorial to download the Leipzig Corpus file used in the paper. It is freely available on the Leipzig Corpora Collection Download page.
    3. Accompanying datasets as images and .rds format so that all code-chunks in the R Markdown file can be run.
    4. BibLaTeX and .csl files for the referencing and bibliography (with APA 6th style).
    5. A snippet of the R session info after running all codes in the R Markdown file.
    6. RStudio project file (.Rproj). Double click on this file to open an RStudio session associated with the content of this repository. See here and here for details on Project-based workflow in RStudio.
    7. A .docx template file following the basic stylesheet for Linguistik Indonesia

    Put all these files in the same folder (including the downloaded Leipzig corpus file)!

    To render the R Markdown into MS Word document, we use the bookdown R package (Xie, 2018). Make sure this package is installed in R.

    Yihui Xie (2018). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.6.


  20. Top programming languages demanded by recruiters worldwide 2025

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top programming languages demanded by recruiters worldwide 2025 [Dataset]. https://www.statista.com/statistics/1296727/programming-languages-demanded-by-recruiters/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Worldwide
    Description

    The most demanded programming languages by recruiters in 2025 were Python, JavaScript, and Java, with around ** percent of recruiters looking to hire people with these programming skills.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
Organization logo

GitHub Programming Languages Data

Statistics for Programming Languages used on GitHub

Explore at:
zip(41198 bytes)Available download formats
Dataset updated
Jan 2, 2022
Authors
Isaac Wen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context

A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

Content

This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

Source

This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

Limitations

Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

Search
Clear search
Close search
Google apps
Main menu