Saved datasets
Last updated
Download format
Usage rights
License from data provider
Please review the applicable license to make sure your contemplated use is permitted.
Topic
Provider
Free
Cost to access
Described as free to access or have a license that allows redistribution.
100+ datasets found
  1. GitHub Programming Languages Data

    • kaggle.com
    zip
    Updated Jan 2, 2022
  2. f

    Programming Languages

    • figshare.com
    zip
    Updated Jun 1, 2023
  3. P

    APPS Dataset

    • paperswithcode.com
    Updated Mar 30, 2023
  4. Most widely utilized programming languages among developers worldwide 2023

    • statista.com
    Updated Jul 19, 2023
  5. Programming Laungages and File Format Detection

    • kaggle.com
    zip
    Updated May 9, 2022
  6. programming languages

    • kaggle.com
    zip
    Updated Sep 7, 2020
  7. Replication Kit: "Skill Models for Programming Language Concepts"

    • zenodo.org
    Updated Jan 24, 2020
  8. Programming languages used for software development worldwide 2022

    • statista.com
    Updated Feb 20, 2023
  9. P

    CodeContests Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Mar 19, 2023
  10. Replication Kit: "Skill Models for Programming Language Concepts"

    • zenodo.org
    Updated Jan 24, 2020
  11. h

    programming-languages-keywords

    • huggingface.co
    Updated Nov 13, 2023
  12. Most frequently required software languages in job offers in Poland 2023

    • statista.com
    Updated Nov 8, 2023
  13. Programming language comparision from instrument automation perspective

    • ieee-dataport.org
    Updated Oct 19, 2023
  14. Programming languages most used in software companies in Russia 2022

    • statista.com
    Updated Dec 16, 2022
  15. m

    Data from: A Decision Model for Programming LanguageEcosystem Selection:...

    • data.mendeley.com
    Updated Nov 1, 2020
  16. c

    Youtube programming videos sample dataset

    • crawlfeeds.com
    json, zip
    Updated Sep 8, 2023
  17. w

    Data from: The C++ programming language

    • workwithdata.com
    Updated Aug 17, 2023
  18. Primary programming languages among microservices developers worldwide 2022

    • statista.com
    Updated Nov 15, 2023
  19. Programming Homework Dataset for Plagiarism Detection

    • ieee-dataport.org
    Updated May 8, 2020
  20. u

    Python Programming Dataset

    • pub.uni-bielefeld.de
    Updated Feb 10, 2020
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Isaac Wen (2022). GitHub Programming Languages Data [Dataset]. https://www.kaggle.com/datasets/isaacwen/github-programming-languages-data
Organization logo

GitHub Programming Languages Data

Statistics for Programming Languages used on GitHub

Explore at:
zip(41198 bytes)Available download formats
Dataset updated
Jan 2, 2022
Authors
Isaac Wen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context

A common question for those new and familiar to computer science and software engineering is what is the most best and/or most popular programming language. It is very difficult to give a definitive answer, as there are a seemingly indefinite number of metrics that can define the 'best' or 'most popular' programming language.

One such metric that can be used to define a 'popular' programming language is the number of projects and files that are made using that programming language. As GitHub is the most popular public collaboration and file-sharing platform, analyzing the languages that are used for repositories, PRs, and issues on GitHub and be a good indicator for the popularity of a language.

Content

This dataset contains statistics about the programming languages used for repositories, PRs, and issues on GitHub. The data is from 2011 to 2021.

Source

This data was queried and aggregated from BigQuery's public github_repos and githubarchive datasets.

Limitations

Only data for public GitHub repositories, and their corresponding PRs/issues, have their data available publicly. Thus, this dataset is only based on public repositories, which may not be fully representative of all repositories on GitHub.

Search
Clear search
Close search
Google apps
Main menu