Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains the following columns:
Month: The date (in year-month format) when the data was recorded. Python Worldwide(%): The percentage of global popularity for Python during that month. JavaScript Worldwide(%): The percentage of global popularity for JavaScript. Java Worldwide(%): The percentage of global popularity for Java. C# Worldwide(%): The percentage of global popularity for C#. PhP Worldwide(%): The percentage of global popularity for PhP. Flutter Worldwide(%): The percentage of global popularity for Flutter. React Worldwide(%): The percentage of global popularity for React. Swift Worldwide(%): The percentage of global popularity for Swift. TypeScript Worldwide(%): The percentage of global popularity for TypeScript. Matlab Worldwide(%): The percentage of global popularity for Matlab.
Each row represents data for a particular month, starting from January 2004, tracking the popularity trends of these programming languages worldwide.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains an anonymized list of surveyed developers who provided their expertise level on three popular JavaScript libraries:
ReactJS, a library for building enriched web interfaces
MongoDB, a driver for accessing MongoDB databased
Socket.IO, a library for realtime communication
Facebook
TwitterAs shown in the chart, Python ranks first with a usage rate of 28.7%, demonstrating its continued advantage in the fields of data science and artificial intelligence. JavaScript follows closely at 19.3%, reflecting its widespread use in front-end and full-stack development. Traditional languages such as Java and C# still maintain a stable market share, while emerging languages like Go and Rust show significant growth potential. Overall, the popularity of programming languages is closely related to technological trends. The leading positions of Python and JavaScript indicate a shift in development focus towards data-driven and web-oriented directions. In the future, with the further development of cloud computing and artificial intelligence, the usage of emerging languages such as Go and Rust is expected to continue increasing.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset was created during the Programming Language Ecosystem project from TU Wien using the code inside the repository https://github.com/ValentinFutterer/UsageOfProgramminglanguages2011-2023?tab=readme-ov-file.
The centerpiece of this repository is the usage_of_programming_languages_2011-2023.csv. This csv file shows the popularity of programming languages over the last 12 years in yearly increments. The repository also contains graphs created with the dataset. To get an accurate estimate on the popularity of programming languages, this dataset was created using 3 vastly different sources.
The dataset was created using the github repository above. As input data, three public datasets where used.
Taken from https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/ by Peter Elmers. It is licensed under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/. It shows metadata information (no code) of all github repositories with more than 5 stars.
Taken from https://github.com/pypl/pypl.github.io/tree/master, put online by the user pcarbonn. It is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/. It shows from 2004 to 2023 for each month the share of programming related google searches per language.
Taken from https://insights.stackoverflow.com/survey. It is licensed under Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/. It shows from 2011 to 2023 the results of the yearly stackoverflow developer survey.
All these datasets were downloaded on the 12.12.2023. The datasets are all in the github repository above
The dataset contains a column for the year and then many columns for the different languages, denoting their usage in percent. Additionally, vertical barcharts and piecharts for each year plus a line graph for each language over the whole timespan as png's are provided.
The languages that are going to be considered for the project can be seen here:
- Python
- C
- C++
- Java
- C#
- JavaScript
- PHP
- SQL
- Assembly
- Scratch
- Fortran
- Go
- Kotlin
- Delphi
- Swift
- Rust
- Ruby
- R
- COBOL
- F#
- Perl
- TypeScript
- Haskell
- Scala
This project is licensed under the Open Data Commons Open Database License (ODbL) v1.0 https://opendatacommons.org/licenses/odbl/1-0/ license.
TLDR: You are free to share, adapt, and create derivative works from this dataser as long as you attribute me, keep the database open (if you redistribute it), and continue to share-alike any adapted database under the ODbl.
Thanks go out to
- stackoverflow https://insights.stackoverflow.com/survey for providing the data from the yearly stackoverflow developer survey.
- the PYPL survey, https://github.com/pypl/pypl.github.io/tree/master for providing google search data.
- Peter Elmers, for crawling metadata on github repositories and providing the data https://www.kaggle.com/datasets/pelmers/github-repository-metadata-with-5-stars/.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
How can we tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that we can tell which are most worth investing time in?
One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, we can get an approximate sense of how many people are using it. We're going to use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.
Each Stack Overflow question has a tag, which marks a question to describe its topic or technology. For instance, there's a tag for languages like R or Python, and for packages like ggplot2 or pandas.
We'll be working with a dataset with one observation for each tag in each year. The dataset includes both the number of questions asked in that tag in that year, and the total number of questions asked in that year.
DataCamp
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT
End-to-End (E2E) testing is a comprehensive approach to validating the functionality of a software application by testing its entire workflow from the user’s perspective, ensuring that all integrated components work together as expected. It is crucial for ensuring the quality and reliability of applications, especially in the web domain, which is often bound by Service Level Agreements (SLAs). This testing involves two key activities:
Graphical User Interface (GUI) testing, which simulates user interactions through browsers, and performance testing, which evaluates system workload handling. Despite its importance, E2E testing is often neglected, and the lack of reliable datasets for Web GUI and performance testing has slowed research progress. This paper addresses these limitations by constructing E2EGit, a comprehensive dataset, cataloging non-trivial open-source web projects on GITHUB that adopt GUI or performance testing.
The dataset construction process involved analyzing over 5k non-trivial web repositories based on popular programming languages (JAVA, JAVASCRIPT TYPESCRIPT PYTHON) to identify: 1) GUI tests based on popular browser automation frameworks (SELENIUM PLAYWRIGHT, CYPRESS, PUPPETEER), 2) performance tests written with the most popular open-source tools (JMETER, LOCUST). After analysis, we identified 472 repositories using web GUI testing, with over 43,000 tests, and 84 repositories using performance testing, with 410 tests.
DATASET DESCRIPTION
The dataset is provided as an SQLite database, whose structure is illustrated in Figure 3 (in the paper), which consists of five tables, each serving a specific purpose.
The repository table contains information on 1.5 million repositories collected using the SEART tool on May 4. It includes 34 fields detailing repository characteristics. The
non_trivial_repository table is a subset of the previous one, listing repositories that passed the two filtering stages described in the pipeline. For each repository, it specifies whether it is a web repository using JAVA, JAVASCRIPT, TYPESCRIPT, or PYTHON frameworks. A repository may use multiple frameworks, with corresponding fields (e.g., is web java) set to true, and the field web dependencies listing the detected web frameworks. For Web GUI testing, the dataset includes two additional tables; gui_testing_test _details, where each row represents a test file, providing the file path, the browser automation framework used, the test engine employed, and the number of tests implemented in the file. gui_testing_repo_details, aggregating data from the previous table at the repository level. Each of the 472 repositories has a row summarizing
the number of test files using frameworks like SELENIUM or PLAYWRIGHT, test engines like JUNIT, and the total number of tests identified. For performance testing, the performance_testing_test_details table contains 410 rows, one for each test identified. Each row includes the file path, whether the test uses JMETER or LOCUST, and extracted details such as the number of thread groups, concurrent users, and requests. Notably, some fields may be absent—for instance, if external files (e.g., CSVs defining workloads) were unavailable, or in the case of Locust tests, where parameters like duration and concurrent users are specified via the command line.
To cite this article refer to this citation:
@inproceedings{di2025e2egit,
title={E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects},
author={Di Meglio, Sergio and Starace, Luigi Libero Lucio and Pontillo, Valeria and Opdebeeck, Ruben and De Roover, Coen and Di Martino, Sergio},
booktitle={2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)},
pages={10--15},
year={2025},
organization={IEEE/ACM}
}
This work has been partially supported by the Italian PNRR MUR project PE0000013-FAIR.
Facebook
TwitterThis data was collected by the team https://dou.ua/ . This resource is very popular in Ukraine. It provides salary statistics, shows current vacancies and publishes useful articles related to the life of an IT specialist. This dataset was taken from the public repository https://github.com/devua/csv/tree/master/salaries . This dataset will include the following data for each of the developer: salary, position (f.e. Junior, Middle), experience, city, tech (f.e C#/.NET, JavaScript, Python). I think this dataset will be useful to our community. Thank you.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created to support research and tool development in the areas of code readability, identifier naming, program comprehension, and code mining. It contains 362,886 unique identifier names (including classes, functions/methods, and variables) extracted from 21 widely used and actively maintained open-source projects.
Projects were carefully selected from four major programming language ecosystems: Java, Python, C#, and JavaScript/TypeScript. The repositories span popular libraries and frameworks in domains such as data science, web development, backend systems, dependency injection, and more. These projects are widely recognized as benchmarks in their respective communities, ensuring that the dataset represents industry best practices in naming and code style.
Context & Motivation: Good identifier naming is fundamental for code readability and maintainability, yet cross-language empirical datasets are rare. This dataset enables comparative studies of naming conventions, training and benchmarking of AI models, and reproducible research on identifier readability. It is designed to be both a large-scale resource and a realistic reflection of naming in production-quality code.
Sources: - commons-lang, guava, hibernate-orm, logging-log4j2, spring-framework - django, flask, numpy, pandas, requests - Autofac, Dapper, Hangfire, IdentityServer, NLog - react, vue, d3, lodash, express, angular, angular-cli, ngx-bootstrap, TypeScript, NestJS
Each identifier is labelled with its project, language, type, and name. We encourage use for academic research, code intelligence, machine learning, and developer education.
Facebook
TwitterThe MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
MultiOOP: A Multi-Language Object-Oriented Programming Benchmark for Large Language Models
Dataset Summary
MultiOOP is a multi-language object-oriented programming benchmark designed to establish fair and robust evaluations for intelligent code generation by large language models (LLMs). It addresses major imbalances in existing benchmarks by covering six popular programming languages: Python, PHP, C++, C#, Java, and JavaScript. The benchmark features 267 tasks per… See the full description on the dataset page: https://huggingface.co/datasets/codeai-dteam/oop.
Facebook
TwitterThe Department of Information Technology and Telecommunications, GIS Unit, has created a series of Map Tile Services for use in public web mapping & desktop applications. The link below describes the Basemap, Labels, & Aerial Photographic map services, as well as, how to utilize them in popular JavaScript web mapping libraries and desktop GIS applications. A showcase application, NYC Then&Now (https://maps.nyc.gov/then&now/) is also included on this page.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 1,100 synthetic job descriptions (JDs) spanning 55 diverse roles, designed to facilitate career guidance, resume building, ATS (Applicant Tracking System) simulation, and research in NLP/ML.
All job descriptions are synthetically generated based on curated references from publicly available job postings, career guides, and professional role descriptions. They are not real job postings but represent realistic expectations, responsibilities, and skills for each role.
Tech Roles (Core, Popular, and Niche)
Non-Tech Roles (Business, Creative, Operations, and Niche)
| Field | Description |
|---|---|
| JobID | Unique identifier for each job description |
| Title | Job role/title |
| ExperienceLevel | Fresher / Junior / Experienced / Lead / Senior |
| YearsOfExperience | Numeric range or years (e.g., 0-1, 3-5) |
| Skills | List of required skills (JSON array or semicolon-separated in CSV) |
| Responsibilities | Key responsibilities (JSON array or semicolon-separated in CSV) |
| Keywords | Role-specific focus areas (JSON array or semicolon-separated in CSV) |
job_dataset.json – structured array of job objects.job_dataset.csv – arrays flattened with semicolons for easy viewing in Excel or Pandas.Synthetic Job Descriptions Dataset (2025) – Curated & Generated by Aditya Raj Srivastava (https://www.kaggle.com/adityarajsrv)
Facebook
TwitterIn 2023, the tech skill most in demand by recruiters was web development. This was closely followed by DevOps and database software skills. Interestingly, over ** percent of recruiters were actively seeking individuals with cybersecurity skills. Not far behind, AI/Machine learning/Deep learning ranked fourth, with approximately ** percent of respondents identifying it as their most sought-after tech skill. These preferences align with the skills that developers worldwide are keen to acquire, particularly web development and AI/Machine learning/Deep learning. AI at the forefront of IT skills Since the release of ChatGPT in late 2022, demand for AI and automation skills has increased across all sectors. In 2023, ChatGPT was the leading technology skill globally according to topic consumption on Udemy Business, experiencing a massive growth of over ***** percent in global topic consumption. In the same year, over ** percent of software developers reported using AI to help write code in the development workflow, while another ** percent said they currently use it for debugging code. Different languages for different needs JavaScript and Java, commonly used for back-end and front-end web development, were the most demanded programming languages worldwide in 2022, followed by SQL and Python. By industry, JavaScript and Java hold the fort in the IT services and aviation industries, while SQL was more popular in the healthcare sector as well as the marketing and advertising industries. Python, well suited for data science applications, was more commonly used in the manufacturing, education, and energy industries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets used for this manuscript were derived from multiple sources: Denver Public Health, Esri, Google, and SafeGraph. Any reuse or redistribution of the datasets are subjected to the restrictions of the data providers: Denver Public Health, Esri, Google, and SafeGraph and should consult relevant parties for permissions.1. COVID-19 case dataset were retrieved from Denver Public Health (Link: https://storymaps.arcgis.com/stories/50dbb5e7dfb6495292b71b7d8df56d0a )2. Point of Interests (POIs) data were retrieved from Esri and SafeGraph (Link: https://coronavirus-disasterresponse.hub.arcgis.com/datasets/6c8c635b1ea94001a52bf28179d1e32b/data?selectedAttribute=naics_code) and verified with Google Places Service (Link: https://developers.google.com/maps/documentation/javascript/reference/places-service)3. The activity risk information is accessible from Texas Medical Association (TMA) (Link: https://www.texmed.org/TexasMedicineDetail.aspx?id=54216 )The datasets for risk assessment and mapping are included in a geodatabase. Per SafeGraph data sharing guidelines, raw data cannot be shared publicly. To view the content of the geodatabase, users should have installed ArcGIS Pro 2.7. The geodatabase includes the following:1. POI. Major attributes are locations, name, and daily popularity.2. Denver neighborhood with weekly COVID-19 cases and computed regional risk levels.3. Simulated four travel logs with anchor points provided. Each is a separate point layer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Helsingissä pysäköinninvalvonnan ja poliisin kirjaamat pysäköintivirheet tammikuusta 2014 alkaen. Aineistossa on virheen tekoaika kuukauden ja vuoden tarkkuudella, osoite, jossa virhemaksu tai huomautus on annettu, virhemaksun vaihe, virheen pääsyy sekä virheen kirjaaja, postinumero ja aluetiedot. Todellinen sijainti -ominaisuus kertoo, onko kyseessä todellinen pysäköintivirheen sijainti. Aineisto viedään tietokantaan noin puolen vuoden välein. Aineistojen esikatselu kartta.hel.fi -palvelussa: Vuosi 2023 Koordinaatisto(t): ETRS-GK25 (EPSG:3879) Rajapintapalvelujen osoitteet: WFS: https://kartta.hel.fi/ws/geoserver/avoindata/wfs?request=getCapabilities Julkaistut tasot: Pysakointivirheet Pysakointivirheet-tason ominaisuustiedot ja tietotyypit: id (int): kohteen yksilöllinen tunniste kuukausi (string): kuukausi sanallisesti vuosi (int): vuosiluku osoite (string): katunimi ja mahdollinen osoitenumero virhemaksun_vaihe (string): vaihe; huomautus tai pysäköintivirhemaksu virheen_paasyy_ja_paaluokka (string): Pysäköintivirheen syy virheen_kirjaaja (string): kuka on kirjannut virheen; pysäköinnintarkastus vai poliisi easting (int): e-koordinaatti northing (int): n-koordinaatti postinumero (string): postinumero postitoimipaikka (string): postitoimipaikka suurpiiri (string): suurpiirin nimi kunta (string): kuntanimi kunta_nro (string): kuntatunnus kaupunginosa (string): kaupunginosan nimi osa_alue (string): osa-alueen nimi todellinen_sijainti (string): ominaisuustieto arvioi onko kyseessä kohteen todellinen sijainti (Kyllä) vai likimääräinen sijainti (Ei) Vuosien 2014 - 2022 tiedostoissa on seuraavat tiedot Virheen tekokuukausi = Pysäköintivirheen tai huomautuksen kirjoituskuukausi Virheen tekovuosi = Pysäköintivirheen tai huomautuksen kirjoitusvuosi Osoite = osoite, jossa virhemaksu tai huomautus on annettu. Osoitetta ei ole välttämättä kaikissa näkyvissä, esim. käsin kirjoitetut poliisien antamat virhemaksut / huomautukset tai se voi olla puutteellinen Virhemaksun vaihe = Pysäköintivirhemaksu tai huomautus Virheen pääluokka / Pääsyy = tässä voi olla yhdestä kolmeen eri virheen luokkaa Virheen kirjaaja = Pysäköinnintarkastaja tai poliisi Kaupunginosa Lisäksi vuoden 2014-2017 paikkatietoaineistossa (SHP) on postinumeroaluetieto (y, x, postinumero, postitoimipaikka, alue, kunta, kunta_nro). Lähteenä on ollut vuoden 2015 pääkaupunkiseudun postinumeroaluejako. Aineistossa on joitakin Helsingin ulkopuolelle jääviä pisteitä. Taulukkomuotoisessa aineistossa (CSV) on joitain osoitteettomia pysäköintivirheitä; niitä ei ole paikkatietoaineistoissa (SHP). Vuosien 2018-2021 paikkatietoaineistot on geokoodattu QGIS Digitransit Geocoding -lisäosalla taulukkomuotoisesta aineistosta (ks. lisätiedot kutakin resurssia klikkaamalla) ja ne sisältävät virheitä. Vuodesta 2022 on saatavilla vain taulukkomuotoinen aineisto (CSV). Vuodesta 2023 eteenpäin uudet datat on saatavilla vain WFS-rajapinnan kautta.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Please enable JavaScript to view the page content.Your support ID is: 17090373647339254903. Este desafio é para testar se você é um visitante humano. Audio is not supported in your browser. Digite o código que aparece na imagem submit Suporte ID: 17090373647339254903.
Facebook
TwitterBy Social Media Data [source]
In-depth Description of the Dataset
This dataset is a comprehensive compilation of tweets from members of the United States Congress, specifically focusing on the month of August in 2017. It contains a wealth of information encapsulating over one thousand account activities from various political entities. These entities include Congress representatives' personal accounts, their office accounts, campaign accounts as well as any committee and party related handles that are associated with them.
The creator behind this project, Alex Litel undertook an ambitious initiative to compile and present all daily tweets originating from both chambers (the House and Senate) using an automated process referred to as 'Tweets of Congress'. This system is programmed to systematically check Twitter at fixed intervals ensuring that every tweet within this time frame is accounted for.
In order to make these vast amounts of data manageable and easily navigable for potential users or researchers, the complete collection has been curated and presented in raw data format using Javascript Object Notation (JSON). These datasets can be found hosted on Github repositories produced daily around midnight Eastern Standard Time (EST).
Furthermore, each aspect involved in collating this dataset including its front-end portion forming the visual facade for users - along with certain mechanical aspects responsible for generating data within given repositories work harmoniously thanks to the Congressional Tweet Automator. For more insights on how each aspect functions together or individually - visit official Github repo's automation section.
Congruent with facilitating convenience for its potential users further, another 'users-filtered.JSON' dataset has been included which contains metadata pertaining to every account utilized by this project during tweet collection.
Despite offering such granulated detail about these digital interactions it's noteworthy that due to sheer size limitations there is a cutoff point where archives will stop collecting data/information making room for new incoming entries ensuring viable repository management.
Aspirants who wish to explore computational social science projects may find high value here since they can use various statistical analysis strategies like content visualization, time-series analysis, and sentiment analysis to reveal and understand underlying patterns within the tweets. Additionally, it can also be used in fields like Natural Language Processing (NLP) for various linguistic studies.
The 'Tweets of Congress' project appreciates contributions from John Otander's Pixyll theme which has been used extensively in building the front-end of the site. Furthermore much owed credit goes to the 'unitedstates/congress-legislators' project which greatly assisted in procuring data that aided creation amidst a wealth of others who have contributed.
Finally, it is vital to mention that this dataset comes under MIT license permitting any person obtaining
Exploratory Data Analysis:
Start with doing a basic exploratory data analysis (EDA) to find trends, patterns and outliers in the Tweet texts.
Analyze tweet lengths: Check if there is any noticeable trend between tweets from different members.
Examine tweet timings: Are most tweets sent during work hours or is there significant activity outside normal business hours?
Delve into the frequency of hashtags/mentions (): Identify the ratio or percentage of tweets that include other users’ handles or hashtags — this could suggest whether Congress members are conversing with constituents via Twitter versus broadcasting messages.
Sentiment analysis: Use NLP tools to perform sentiment analysis on Tweet text to gauge overall sentiments being expressed by congressmen over time.
Social Network Analysis:
Social Network Analysis (SNA) is a popular approach for identifying influential individuals in social networks like Twitter.
Graph theory techniques could be employed in identifying clusters and communities among Congress members based on who they mention in their tweets (indicating possible relationships between users).
Centrality measures can help identify influential Twitter handles that serve as important information hubs or bridges in communication paths.
There’s also potential for studying Congressional relationships through frequency of communications amongst each other, which could demonstrate alliances.
...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains the following columns:
Month: The date (in year-month format) when the data was recorded. Python Worldwide(%): The percentage of global popularity for Python during that month. JavaScript Worldwide(%): The percentage of global popularity for JavaScript. Java Worldwide(%): The percentage of global popularity for Java. C# Worldwide(%): The percentage of global popularity for C#. PhP Worldwide(%): The percentage of global popularity for PhP. Flutter Worldwide(%): The percentage of global popularity for Flutter. React Worldwide(%): The percentage of global popularity for React. Swift Worldwide(%): The percentage of global popularity for Swift. TypeScript Worldwide(%): The percentage of global popularity for TypeScript. Matlab Worldwide(%): The percentage of global popularity for Matlab.
Each row represents data for a particular month, starting from January 2004, tracking the popularity trends of these programming languages worldwide.